1
|
Arango NK, Morgante F. Comparing statistical learning methods for complex trait prediction from gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596951. [PMID: 38895364 PMCID: PMC11185554 DOI: 10.1101/2024.06.01.596951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Accurate prediction of complex traits is an important task in quantitative genetics that has become increasingly relevant for personalized medicine. Genotypes have traditionally been used for trait prediction using a variety of methods such as mixed models, Bayesian methods, penalized regressions, dimension reductions, and machine learning methods. Recent studies have shown that gene expression levels can produce higher prediction accuracy than genotypes. However, only a few prediction methods were used in these studies. Thus, a comprehensive assessment of methods is needed to fully evaluate the potential of gene expression as a predictor of complex trait phenotypes. Here, we used data from the Drosophila Genetic Reference Panel (DGRP) to compare the ability of several existing statistical learning methods to predict starvation resistance from gene expression in the two sexes separately. The methods considered differ in assumptions about the distribution of gene effect sizes - ranging from models that assume that every gene affects the trait to more sparse models - and their ability to capture gene-gene interactions. We also used functional annotation (i.e., Gene Ontology (GO)) as an external source of biological information to inform prediction models. The results show that differences in prediction accuracy between methods exist, although they are generally not large. Methods performing variable selection gave higher accuracy in females while methods assuming a more polygenic architecture performed better in males. Incorporating GO annotations further improved prediction accuracy for a few GO terms of biological significance. Biological significance extended to the genes underlying highly predictive GO terms with different genes emerging between sexes. Notably, the Insulin-like Receptor (InR) was prevalent across methods and sexes. Our results confirmed the potential of transcriptomic prediction and highlighted the importance of selecting appropriate methods and strategies in order to achieve accurate predictions.
Collapse
Affiliation(s)
- Noah Klimkowski Arango
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| |
Collapse
|
2
|
Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024; 17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]
Abstract
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.
Collapse
Affiliation(s)
- Admas Alemu
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Johanna Åstrand
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden; Lantmännen Lantbruk, Svalöv, Sweden
| | | | - Julio Isidro Y Sánchez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Javier Fernández-Gónzalez
- Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Campus de Montegancedo-UPM, 28223 Madrid, Spain
| | - Wuletaw Tadesse
- International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco
| | - Ramesh R Vetukuri
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Anders S Carlsson
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | | | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera México-Veracruz, Texcoco, México 52640, Mexico
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden.
| | - Aakash Chawade
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| |
Collapse
|
3
|
Nowak B, Tomkowiak A, Sobiech A, Bocianowski J, Kowalczewski PŁ, Spychała J, Jamruszka T. Identification and Analysis of Candidate Genes Associated with Yield Structure Traits and Maize Yield Using Next-Generation Sequencing Technology. Genes (Basel) 2023; 15:56. [PMID: 38254946 PMCID: PMC10815399 DOI: 10.3390/genes15010056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The main challenge of agriculture in the 21st century is the continuous increase in food production. In addition to ensuring food security, the goal of modern agriculture is the continued development and production of plant-derived biomaterials. Conventional plant breeding methods do not allow breeders to achieve satisfactory results in obtaining new varieties in a short time. Currently, advanced molecular biology tools play a significant role worldwide, markedly contributing to biological progress. The aim of this study was to identify new markers linked to candidate genes determining grain yield. Next-generation sequencing, gene association, and physical mapping were used to identify markers. An additional goal was to also optimize diagnostic procedures to identify molecular markers on reference materials. As a result of the conducted research, 19 SNP markers significantly associated with yield structure traits in maize were identified. Five of these markers (28629, 28625, 28640, 28649, and 29294) are located within genes that can be considered candidate genes associated with yield traits. For two markers (28639 and 29294), different amplification products were obtained on the electrophorograms. For marker 28629, a specific product of 189 bp was observed for genotypes 1, 4, and 10. For marker 29294, a specific product of 189 bp was observed for genotypes 1 and 10. Both markers can be used for the preliminary selection of well-yielding genotypes.
Collapse
Affiliation(s)
- Bartosz Nowak
- Smolice Plant Breeding Ltd., IHAR Group, Smolice 146, 63-740 Kobylin, Poland;
| | - Agnieszka Tomkowiak
- Department of Genetics and Plant Breeding, Poznań University of Life Sciences, Dojazd 11, 60-632 Poznań, Poland; (A.S.); (J.S.); (T.J.)
| | - Aleksandra Sobiech
- Department of Genetics and Plant Breeding, Poznań University of Life Sciences, Dojazd 11, 60-632 Poznań, Poland; (A.S.); (J.S.); (T.J.)
| | - Jan Bocianowski
- Department of Mathematical and Statistical Methods, Poznań University of Life Sciences, Wojska Polskiego 28, 60-637 Poznań, Poland;
| | - Przemysław Łukasz Kowalczewski
- Department of Food Technology of Plant Origin, Poznań University of Life Sciences, Wojska Polskiego 31, 60-624 Poznań, Poland;
| | - Julia Spychała
- Department of Genetics and Plant Breeding, Poznań University of Life Sciences, Dojazd 11, 60-632 Poznań, Poland; (A.S.); (J.S.); (T.J.)
| | - Tomasz Jamruszka
- Department of Genetics and Plant Breeding, Poznań University of Life Sciences, Dojazd 11, 60-632 Poznań, Poland; (A.S.); (J.S.); (T.J.)
| |
Collapse
|
4
|
Bharati R, Sen MK, Severová L, Svoboda R, Fernández-Cusimamani E. Polyploidization and genomic selection integration for grapevine breeding: a perspective. FRONTIERS IN PLANT SCIENCE 2023; 14:1248978. [PMID: 38034577 PMCID: PMC10684766 DOI: 10.3389/fpls.2023.1248978] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/30/2023] [Indexed: 12/02/2023]
Abstract
Grapevines are economically important woody perennial crops widely cultivated for their fruits that are used for making wine, grape juice, raisins, and table grapes. However, grapevine production is constantly facing challenges due to climate change and the prevalence of pests and diseases, causing yield reduction, lower fruit quality, and financial losses. To ease the burden, continuous crop improvement to develop superior grape genotypes with desirable traits is imperative. Polyploidization has emerged as a promising tool to generate genotypes with novel genetic combinations that can confer desirable traits such as enhanced organ size, improved fruit quality, and increased resistance to both biotic and abiotic stresses. While previous studies have shown high polyploid induction rates in Vitis spp., rigorous screening of genotypes among the produced polyploids to identify those exhibiting desired traits remains a major bottleneck. In this perspective, we propose the integration of the genomic selection approach with omics data to predict genotypes with desirable traits among the vast unique individuals generated through polyploidization. This integrated approach can be a powerful tool for accelerating the breeding of grapevines to develop novel and improved grapevine varieties.
Collapse
Affiliation(s)
- Rohit Bharati
- Department of Crop Sciences and Agroforestry, The Faculty of Tropical AgriSciences, Czech University of Life Sciences Prague, Suchdol, Czechia
| | - Madhab Kumar Sen
- Department of Agroecology and Crop Production, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, Suchdol, Czechia
| | - Lucie Severová
- Department of Economic Theories, Faculty of Economics and Management, Czech University of Life Sciences Prague, Prague, Czechia
| | - Roman Svoboda
- Department of Economic Theories, Faculty of Economics and Management, Czech University of Life Sciences Prague, Prague, Czechia
| | - Eloy Fernández-Cusimamani
- Department of Crop Sciences and Agroforestry, The Faculty of Tropical AgriSciences, Czech University of Life Sciences Prague, Suchdol, Czechia
| |
Collapse
|
5
|
Della Coletta R, Fernandes SB, Monnahan PJ, Mikel MA, Bohn MO, Lipka AE, Hirsch CN. Importance of genetic architecture in marker selection decisions for genomic prediction. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:220. [PMID: 37819415 DOI: 10.1007/s00122-023-04469-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 09/25/2023] [Indexed: 10/13/2023]
Abstract
KEY MESSAGE We demonstrate potential for improved multi-environment genomic prediction accuracy using structural variant markers. However, the degree of observed improvement is highly dependent on the genetic architecture of the trait. Breeders commonly use genetic markers to predict the performance of untested individuals as a way to improve the efficiency of breeding programs. These genomic prediction models have almost exclusively used single nucleotide polymorphisms (SNPs) as their source of genetic information, even though other types of markers exist, such as structural variants (SVs). Given that SVs are associated with environmental adaptation and not all of them are in linkage disequilibrium to SNPs, SVs have the potential to bring additional information to multi-environment prediction models that are not captured by SNPs alone. Here, we evaluated different marker types (SNPs and/or SVs) on prediction accuracy across a range of genetic architectures for simulated traits across multiple environments. Our results show that SVs can improve prediction accuracy, but it is highly dependent on the genetic architecture of the trait and the relative gain in accuracy is minimal. When SVs are the only causative variant type, 70% of the time SV predictors outperform SNP predictors. However, the improvement in accuracy in these instances is only 1.5% on average. Further simulations with predictors in varying degrees of LD with causative variants of different types (e.g., SNPs, SVs, SNPs and SVs) showed that prediction accuracy increased as linkage disequilibrium between causative variants and predictors increased regardless of the marker type. This study demonstrates that knowing the genetic architecture of a trait in deciding what markers to use in large-scale genomic prediction modeling in a breeding program is more important than what types of markers to use.
Collapse
Affiliation(s)
- Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Samuel B Fernandes
- Department of Crop, Soil and Environmental Sciences at University of Arkansas, Fayetteville, AR, 72701, USA
| | - Patrick J Monnahan
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - Mark A Mikel
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Martin O Bohn
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
6
|
Boggio GM, Christensen OF, Legarra A, Meynadier A, Marie-Etancelin C. Microbiability of milk composition and genetic control of microbiota effects in sheep. J Dairy Sci 2023; 106:6288-6298. [PMID: 37474364 DOI: 10.3168/jds.2022-22948] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 02/28/2023] [Indexed: 07/22/2023]
Abstract
Recently, high-dimensional omics data are becoming available in larger quantities, and models have been developed that integrate them with genomics to understand in finer detail the relationship between genotype and phenotype, and thus improve the performance of genetic evaluations. Our objectives are to quantify the effect of the inclusion of microbiome data in the genetic evaluation for dairy traits in sheep, through the estimation of the heritability, microbiability, and how the microbiome effect on dairy traits decomposes into genetic and nongenetic parts. In this study we analyzed milk and rumen samples of 795 Lacaune dairy ewes. We included, as phenotype, dairy traits and milk fatty acids and proteins composition; as omics measurements, 16S rRNA rumen bacterial abundances; and as genotyping, 54K SNP chip for all ewes. Two nested genomic models were used: a first model to predict the individual contributions of the genetic and microbial abundances to phenotypes, and a second model to predict the additive genetic effect of the microbial community. In addition, microbiome-wide association studies for all dairy traits were applied using the 2,059 rumen bacterial abundances, and the genetic correlations between microbiome principal components and dairy traits were estimated. Results showed that in general the inclusion of both genetic and microbiome effect did not improve the fit of the model compared with the model with the genetic effect only. In addition, for all dairy traits the total heritability was equal to the direct heritability after fitting microbiota effects, due to a microbiability being almost zero for most dairy traits and heritability of the microbial community was very close to zero. Microbiome-wide association studies did not show operational taxonomic units with major effect for any of the dairy traits evaluated, and the genetic correlations between the first 5 principal components and dairy traits were low to moderate. So far, we can conclude that, using a substantial data set of 795 Lacaune dairy ewes, rumen bacterial abundances do not provide improved genetic evaluation for dairy traits in sheep.
Collapse
Affiliation(s)
- G Martinez Boggio
- GenPhySE, Université de Toulouse, INRAE-ENVT, 31326, Castanet-Tolosan, France.
| | - O F Christensen
- Center for Quantitative Genetics and Genomics, Aarhus University, DK-8000 Aarhus C, Denmark
| | - A Legarra
- GenPhySE, Université de Toulouse, INRAE-ENVT, 31326, Castanet-Tolosan, France
| | - A Meynadier
- GenPhySE, Université de Toulouse, INRAE-ENVT, 31326, Castanet-Tolosan, France
| | - C Marie-Etancelin
- GenPhySE, Université de Toulouse, INRAE-ENVT, 31326, Castanet-Tolosan, France.
| |
Collapse
|
7
|
Upton RN, Correr FH, Lile J, Reynolds GL, Falaschi K, Cook JP, Lachowiec J. Design, execution, and interpretation of plant RNA-seq analyses. FRONTIERS IN PLANT SCIENCE 2023; 14:1135455. [PMID: 37457354 PMCID: PMC10348879 DOI: 10.3389/fpls.2023.1135455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 06/12/2023] [Indexed: 07/18/2023]
Abstract
Genomics has transformed our understanding of the genetic architecture of traits and the genetic variation present in plants. Here, we present a review of how RNA-seq can be performed to tackle research challenges addressed by plant sciences. We discuss the importance of experimental design in RNA-seq, including considerations for sampling and replication, to avoid pitfalls and wasted resources. Approaches for processing RNA-seq data include quality control and counting features, and we describe common approaches and variations. Though differential gene expression analysis is the most common analysis of RNA-seq data, we review multiple methods for assessing gene expression, including detecting allele-specific gene expression and building co-expression networks. With the production of more RNA-seq data, strategies for integrating these data into genetic mapping pipelines is of increased interest. Finally, special considerations for RNA-seq analysis and interpretation in plants are needed, due to the high genome complexity common across plants. By incorporating informed decisions throughout an RNA-seq experiment, we can increase the knowledge gained.
Collapse
|
8
|
Legarra A, Christensen O. Genomic evaluation methods to include intermediate correlated features such as high-throughput or omics phenotypes. JDS COMMUNICATIONS 2022; 4:55-60. [PMID: 36713125 PMCID: PMC9873823 DOI: 10.3168/jdsc.2022-0276] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 09/26/2022] [Indexed: 12/05/2022]
Abstract
Gene expression is supposed to be an intermediate between DNA and the phenotype, and it can be measured. Thus, for a trait, we may have intermediate measures, which are in fact a series of genetically controlled traits. Similarly, several traits may be measured or predicted using infrared spectra, accelerometers, and similar high-throughput measures that we will call "omics." Although these measurements have errors, many of them are heritable, and they may be more accurate or easier to record than the trait of interest. It is therefore important to develop methods to use intermediate measurements in selection. Here, we present methods and perspectives for selection based on massively recorded intermediate traits (omics). Recent developments allow a hierarchical integrated framework for prediction, in which a trait is partially controlled by omics. In addition, the omics measures are themselves partly controlled by genetics ("mediated breeding values") and partly by environment or residual factors. Thus, a part of the genetic determinism of a trait is mediated by omics, whereas the remaining part is not mediated, which results in "residual breeding values." In such a framework, genetic evaluations consist of 2 nested genomic BLUP-based models. In the first, the effect of omics on the trait (which can be seen as an improved estimate of the phenotype) and the residual breeding values are estimated. The second model extracts the mediated breeding values from the improved estimate of the phenotype, considering that omics themselves are heritable. The whole procedure is called GOBLUP (genomics omics BLUP) and it allows measures in only some individuals; that is, it is a "single-step"-like method. In this model, heritability is split into "mediated" and "not mediated" parts. This decomposition allows us to predict how accurate the omics measure of the trait would be compared with the direct measure. The ideal omics measure is heritable and explains a large part of the phenotypic variation of the trait. Ideally, this could be the case for some traits with low heritability. However, even if the omics measure explains only a small part of the phenotypic variation, when omics measurement themselves are heritable, the use of such a model would lead to more accurate selection. Expressions for upper bounds of reliability given omics measurements are also presented. More studies are needed to confirm the usefulness of omics or high-throughput prediction. Usefulness of the technology likely needs to be checked on a case-by-case basis.
Collapse
Affiliation(s)
- A. Legarra
- GenPhySE (Genetique, Physiologie et Systemes d'Elevage), INRA, 31326 Castanet-Tolosan, France,Corresponding author
| | - O.F. Christensen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| |
Collapse
|
9
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
10
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence. G3 (BETHESDA, MD.) 2022; 12:jkac258. [PMID: 36161485 PMCID: PMC9635642 DOI: 10.1093/g3journal/jkac258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
Recent developments allowed generating multiple high-quality 'omics' data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Corresponding author: Animal Breeding and Genomics, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands.
| |
Collapse
|
11
|
Robert P, Goudemand E, Auzanneau J, Oury FX, Rolland B, Heumez E, Bouchet S, Caillebotte A, Mary-Huard T, Le Gouis J, Rincent R. Phenomic selection in wheat breeding: prediction of the genotype-by-environment interaction in multi-environment breeding trials. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3337-3356. [PMID: 35939074 DOI: 10.1007/s00122-022-04170-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
Phenomic prediction of wheat grain yield and heading date in different multi-environmental trial scenarios is accurate. Modelling the genotype-by-environment interaction effect using phenomic data is a potentially low-cost complement to genomic prediction. The performance of wheat cultivars in multi-environmental trials (MET) is difficult to predict because of the genotype-by-environment interactions (G × E). Phenomic selection is supposed to be efficient for modelling the G × E effect because it accounts for non-additive effects. Here, phenomic data are near-infrared (NIR) spectra obtained from plant material. While phenomic selection has recently been shown to accurately predict wheat grain yield in single environments, its accuracy needs to be investigated for MET. We used four datasets from two winter wheat breeding programs to test and compare the predictive abilities of phenomic and genomic models for grain yield and heading date in different MET scenarios. We also compared different methods to model the G × E using different covariance matrices based on spectra. On average, phenomic and genomic prediction abilities are similar in all different MET scenarios. Better predictive abilities were obtained when G × E effects were modelled with NIR spectra than without them, and it was better to use all the spectra of all genotypes in all environments for modelling the G × E. To facilitate the implementation of phenomic prediction, we tested MET designs where the NIR spectra were measured only on the genotype-environment combinations phenotyped for the target trait. Missing spectra were predicted with a weighted multivariate ridge regression. Intermediate predictive abilities for grain yield were obtained in a sparse testing scenario and for new genotypes, which shows that phenomic selection is an efficient and practicable prediction method for dealing with G × E.
Collapse
Affiliation(s)
- Pauline Robert
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - Ellen Goudemand
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - Jérôme Auzanneau
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
| | - François-Xavier Oury
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Bernard Rolland
- INRAE-Agrocampus Ouest-Université Rennes 1, UMR1349, IGEPP, Domaine de la Motte, 35653, Le Rheu, France
| | - Emmanuel Heumez
- INRAE, UE 972, Grandes Cultures Innovation Environnement, 2 Chaussée Brunehaut, 80200, Estrées-Mons, France
| | - Sophie Bouchet
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Antoine Caillebotte
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
- MIA, INRAE, AgroParisTech, Université Paris-Saclay, 75005, Paris, France
| | - Jacques Le Gouis
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Renaud Rincent
- INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Université Paris-Saclay, 91190, Gif-sur-Yvette, France.
- INRAE - Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
12
|
Liang M, An B, Chang T, Deng T, Du L, Li K, Cao S, Du Y, Xu L, Zhang L, Gao X, Li J, Gao H. Incorporating kernelized multi-omics data improves the accuracy of genomic prediction. J Anim Sci Biotechnol 2022; 13:103. [PMID: 36127743 PMCID: PMC9490992 DOI: 10.1186/s40104-022-00756-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/08/2022] [Indexed: 11/18/2022] Open
Abstract
Background Genomic selection (GS) has revolutionized animal and plant breeding after the first implementation via early selection before measuring phenotypes. Besides genome, transcriptome and metabolome information are increasingly considered new sources for GS. Difficulties in building the model with multi-omics data for GS and the limit of specimen availability have both delayed the progress of investigating multi-omics. Results We utilized the Cosine kernel to map genomic and transcriptomic data as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${n}\times {n}$$\end{document}n×n symmetric matrix (G matrix and T matrix), combined with the best linear unbiased prediction (BLUP) for GS. Here, we defined five kernel-based prediction models: genomic BLUP (GBLUP), transcriptome-BLUP (TBLUP), multi-omics BLUP (MBLUP, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\boldsymbol M=\mathrm{ratio}\times\boldsymbol G+(1-\mathrm{ratio})\times\boldsymbol T$$\end{document}M=ratio×G+(1-ratio)×T), multi-omics single-step BLUP (mssBLUP), and weighted multi-omics single-step BLUP (wmssBLUP) to integrate transcribed individuals and genotyped resource population. The predictive accuracy evaluations in four traits of the Chinese Simmental beef cattle population showed that (1) MBLUP was far preferred to GBLUP (ratio = 1.0), (2) the prediction accuracy of wmssBLUP and mssBLUP had 4.18% and 3.37% average improvement over GBLUP, (3) We also found the accuracy of wmssBLUP increased with the growing proportion of transcribed cattle in the whole resource population. Conclusions We concluded that the inclusion of transcriptome data in GS had the potential to improve accuracy. Moreover, wmssBLUP is accepted to be a promising alternative for the present situation in which plenty of individuals are genotyped when fewer are transcribed. Supplementary Information The online version contains supplementary material available at 10.1186/s40104-022-00756-6.
Collapse
Affiliation(s)
- Mang Liang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Bingxing An
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Tianpeng Chang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Tianyu Deng
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lili Du
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Keanning Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Sheng Cao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Yueying Du
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China.
| |
Collapse
|
13
|
Mbebi AJ, Breitler JC, Bordeaux M, Sulpice R, McHale M, Tong H, Toniutti L, Castillo JA, Bertrand B, Nikoloski Z. A comparative analysis of genomic and phenomic predictions of growth-related traits in 3-way coffee hybrids. G3 GENES|GENOMES|GENETICS 2022; 12:6632664. [PMID: 35792875 PMCID: PMC9434219 DOI: 10.1093/g3journal/jkac170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022]
Abstract
Abstract
Genomic prediction has revolutionized crop breeding despite remaining issues of transferability of models to unseen environmental conditions and environments. Usage of endophenotypes rather than genomic markers leads to the possibility of building phenomic prediction models that can account, in part, for this challenge. Here, we compare and contrast genomic prediction and phenomic prediction models for 3 growth-related traits, namely, leaf count, tree height, and trunk diameter, from 2 coffee 3-way hybrid populations exposed to a series of treatment-inducing environmental conditions. The models are based on 7 different statistical methods built with genomic markers and ChlF data used as predictors. This comparative analysis demonstrates that the best-performing phenomic prediction models show higher predictability than the best genomic prediction models for the considered traits and environments in the vast majority of comparisons within 3-way hybrid populations. In addition, we show that phenomic prediction models are transferrable between conditions but to a lower extent between populations and we conclude that chlorophyll a fluorescence data can serve as alternative predictors in statistical models of coffee hybrid performance. Future directions will explore their combination with other endophenotypes to further improve the prediction of growth-related traits for crops.
Collapse
Affiliation(s)
- Alain J Mbebi
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam , Potsdam-Golm 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology , Potsdam-Golm 14476, Germany
| | - Jean-Christophe Breitler
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Montpellier 34398, France
| | - Mélanie Bordeaux
- Fundación Nicafrance , Finca La Cumplida Km. 147 Carretera Matagalpa - La Dalia, 3 Km al Noreste, Matagalpa, Nicaragua
| | - Ronan Sulpice
- National University Ireland Galway, Plant Systems Biology Laboratory, Ryan Institute, School of Natural Sciences , Galway H91 TK33, Ireland
| | - Marcus McHale
- National University Ireland Galway, Plant Systems Biology Laboratory, Ryan Institute, School of Natural Sciences , Galway H91 TK33, Ireland
| | - Hao Tong
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam , Potsdam-Golm 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology , Potsdam-Golm 14476, Germany
- Center for Plant Systems Biology and Biotechnology , Plovdiv 4000, Bulgaria
| | - Lucile Toniutti
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Montpellier 34398, France
| | - Jonny Alonso Castillo
- Fundación Nicafrance , Finca La Cumplida Km. 147 Carretera Matagalpa - La Dalia, 3 Km al Noreste, Matagalpa, Nicaragua
| | - Benoît Bertrand
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, Montpellier 34398, France
| | - Zoran Nikoloski
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam , Potsdam-Golm 14476, Germany
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology , Potsdam-Golm 14476, Germany
- Center for Plant Systems Biology and Biotechnology , Plovdiv 4000, Bulgaria
| |
Collapse
|
14
|
Hansen PB, Ruud AK, de los Campos G, Malinowska M, Nagy I, Svane SF, Thorup-Kristensen K, Jensen JD, Krusell L, Asp T. Integration of DNA Methylation and Transcriptome Data Improves Complex Trait Prediction in Hordeum vulgare. PLANTS 2022; 11:plants11172190. [PMID: 36079572 PMCID: PMC9459846 DOI: 10.3390/plants11172190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/19/2022] [Accepted: 08/21/2022] [Indexed: 11/30/2022]
Abstract
Whole-genome multi-omics profiles contain valuable information for the characterization and prediction of complex traits in plants. In this study, we evaluate multi-omics models to predict four complex traits in barley (Hordeum vulgare); grain yield, thousand kernel weight, protein content, and nitrogen uptake. Genomic, transcriptomic, and DNA methylation data were obtained from 75 spring barley lines tested in the RadiMax semi-field phenomics facility under control and water-scarce treatment. By integrating multi-omics data at genomic, transcriptomic, and DNA methylation regulatory levels, a higher proportion of phenotypic variance was explained (0.72–0.91) than with genomic models alone (0.55–0.86). The correlation between predictions and phenotypes varied from 0.17–0.28 for control plants and 0.23–0.37 for water-scarce plants, and the increase in accuracy was significant for nitrogen uptake and protein content compared to models using genomic information alone. Adding transcriptomic and DNA methylation information to the prediction models explained more of the phenotypic variance attributed to the environment in grain yield and nitrogen uptake. It furthermore explained more of the non-additive genetic effects for thousand kernel weight and protein content. Our results show the feasibility of multi-omics prediction for complex traits in barley.
Collapse
Affiliation(s)
- Pernille Bjarup Hansen
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
- Correspondence: (P.B.H.); (T.A.); Tel.: +45-87158243 (T.A.)
| | - Anja Karine Ruud
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Marta Malinowska
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Istvan Nagy
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Simon Fiil Svane
- Section for Crop Sciences, Department of Plant and Environmental Sciences, Copenhagen University, 2630 Taastrup, Denmark
| | - Kristian Thorup-Kristensen
- Section for Crop Sciences, Department of Plant and Environmental Sciences, Copenhagen University, 2630 Taastrup, Denmark
| | | | - Lene Krusell
- Sejet Plant Breeding, Nørremarksvej 67, 8700 Horsens, Denmark
| | - Torben Asp
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
- Correspondence: (P.B.H.); (T.A.); Tel.: +45-87158243 (T.A.)
| |
Collapse
|
15
|
Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics 2022; 23:476. [PMID: 35764918 PMCID: PMC9238188 DOI: 10.1186/s12864-022-08690-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Multi-omics represent a promising link between phenotypes and genome variation. Few studies yet address their integration to understand genetic architecture and improve predictability. Results Our study used 241 poplar genotypes, phenotyped in two common gardens, with xylem and cambium RNA sequenced at one site, yielding large phenotypic, genomic (SNP), and transcriptomic datasets. Prediction models for each trait were built separately for SNPs and transcripts, and compared to a third model integrated by concatenation of both omics. The advantage of integration varied across traits and, to understand such differences, an eQTL analysis was performed to characterize the interplay between the genome and transcriptome and classify the predicting features into cis or trans relationships. A strong, significant negative correlation was found between the change in predictability and the change in predictor ranking for trans eQTLs for traits evaluated in the site of transcriptomic sampling. Conclusions Consequently, beneficial integration happens when the redundancy of predictors is decreased, likely leaving the stage to other less prominent but complementary predictors. An additional gene ontology (GO) enrichment analysis appeared to corroborate such statistical output. To our knowledge, this is a novel finding delineating a promising method to explore data integration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08690-7.
Collapse
|
16
|
Phenomic Selection: A New and Efficient Alternative to Genomic Selection. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:397-420. [PMID: 35451784 DOI: 10.1007/978-1-0716-2205-6_14] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Recently, it has been proposed to switch molecular markers to near-infrared (NIR) spectra for inferring relationships between individuals and further performing phenomic selection (PS), analogous to genomic selection (GS). The PS concept is similar to genomic-like omics-based (GLOB) selection, in which molecular markers are replaced by endophenotypes, such as metabolites or transcript levels, except that the phenomic information obtained for instance by near-infrared spectroscopy (NIRS ) has usually a much lower cost than other omics. Though NIRS has been routinely used in breeding for several decades, especially to deal with end-product quality traits, its use to predict other traits of interest and further make selections is new. Since the seminal paper on PS , several publications have advocated the use of spectral acquisition (including NIRS and hyperspectral imaging) in plant breeding towards PS , potentially providing a scope of what is possible. In the present chapter, we first come back to the concept of PS as originally proposed and provide a classification of selected papers related to the use of phenomics in breeding. We further provide a review of the selected literature concerning the type of technology used, the preprocessing of the spectra, and the statistical modeling to make predictions. We discuss the factors that likely affect the efficiency of PS and compare it to GS in terms of predictive ability. Finally, we propose several prospects for future work and application of PS in the context of plant breeding.
Collapse
|
17
|
Wu PY, Stich B, Weisweiler M, Shrestha A, Erban A, Westhoff P, Inghelandt DV. Improvement of prediction ability by integrating multi-omic datasets in barley. BMC Genomics 2022; 23:200. [PMID: 35279073 PMCID: PMC8917753 DOI: 10.1186/s12864-022-08337-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Genomic prediction (GP) based on single nucleotide polymorphisms (SNP) has become a broadly used tool to increase the gain of selection in plant breeding. However, using predictors that are biologically closer to the phenotypes such as transcriptome and metabolome may increase the prediction ability in GP. The objectives of this study were to (i) assess the prediction ability for three yield-related phenotypic traits using different omic datasets as single predictors compared to a SNP array, where these omic datasets included different types of sequence variants (full-SV, deleterious-dSV, and tolerant-tSV), different types of transcriptome (expression presence/absence variation-ePAV, gene expression-GE, and transcript expression-TE) sampled from two tissues, leaf and seedling, and metabolites (M); (ii) investigate the improvement in prediction ability when combining multiple omic datasets information to predict phenotypic variation in barley breeding programs; (iii) explore the predictive performance when using SV, GE, and ePAV from simulated 3’end mRNA sequencing of different lengths as predictors. Results The prediction ability from genomic best linear unbiased prediction (GBLUP) for the three traits using dSV information was higher than when using tSV, all SV information, or the SNP array. Any predictors from the transcriptome (GE, TE, as well as ePAV) and metabolome provided higher prediction abilities compared to the SNP array and SV on average across the three traits. In addition, some (di)-similarity existed between different omic datasets, and therefore provided complementary biological perspectives to phenotypic variation. Optimal combining the information of dSV, TE, ePAV, as well as metabolites into GP models could improve the prediction ability over that of the single predictors alone. Conclusions The use of integrated omic datasets in GP model is highly recommended. Furthermore, we evaluated a cost-effective approach generating 3’end mRNA sequencing with transcriptome data extracted from seedling without losing prediction ability in comparison to the full-length mRNA sequencing, paving the path for the use of such prediction methods in commercial breeding programs. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08337-7).
Collapse
|
18
|
Robert P, Auzanneau J, Goudemand E, Oury FX, Rolland B, Heumez E, Bouchet S, Le Gouis J, Rincent R. Phenomic selection in wheat breeding: identification and optimisation of factors influencing prediction accuracy and comparison to genomic selection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:895-914. [PMID: 34988629 DOI: 10.1007/s00122-021-04005-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/23/2021] [Indexed: 05/15/2023]
Abstract
Phenomic selection is a promising alternative or complement to genomic selection in wheat breeding. Models combining spectra from different environments maximise the predictive ability of grain yield and heading date of wheat breeding lines. Phenomic selection (PS) is a recent breeding approach similar to genomic selection (GS) except that genotyping is replaced by near-infrared (NIR) spectroscopy. PS can potentially account for non-additive effects and has the major advantage of being low cost and high throughput. Factors influencing GS predictive abilities have been intensively studied, but little is known about PS. We tested and compared the abilities of PS and GS to predict grain yield and heading date from several datasets of bread wheat lines corresponding to the first or second years of trial evaluation from two breeding companies and one research institute in France. We evaluated several factors affecting PS predictive abilities including the possibility of combining spectra collected in different environments. A simple H-BLUP model predicted both traits with prediction ability from 0.26 to 0.62 and with an efficient computation time. Our results showed that the environments in which lines are grown had a crucial impact on predictive ability based on the spectra acquired and was specific to the trait considered. Models combining NIR spectra from different environments were the best PS models and were at least as accurate as GS in most of the datasets. Furthermore, a GH-BLUP model combining genotyping and NIR spectra was the best model of all (prediction ability from 0.31 to 0.73). We demonstrated also that as for GS, the size and the composition of the training set have a crucial impact on predictive ability. PS could therefore replace or complement GS for efficient wheat breeding programs.
Collapse
Affiliation(s)
- Pauline Robert
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - Jérôme Auzanneau
- Agri-Obtentions, Ferme de Gauvilliers, 78660, Orsonville, France
| | - Ellen Goudemand
- Florimond-Desprez Veuve & Fils SAS, 3 rue Florimond-Desprez, BP 41, 59242, Cappelle-en-Pévèle, France
| | - François-Xavier Oury
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
| | - Bernard Rolland
- INRAE-Agrocampus Ouest-Université Rennes 1, UMR1349, IGEPP, Domaine de la Motte, 35653, Le Rheu, France
| | - Emmanuel Heumez
- INRAE, UE 972, Grandes Cultures Innovation Environnement, 2 Chaussée Brunehaut, 80200, EstréesMons, France
| | - Sophie Bouchet
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
| | - Jacques Le Gouis
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France.
- INRAE-Université Clermont-Auvergne, UMR1095, GDEC, 5 chemin de Beaulieu, 63000, ClermontFerrand, France.
| |
Collapse
|
19
|
Zhao T, Zeng J, Cheng H. Extend mixed models to multilayer neural networks for genomic prediction including intermediate omics data. Genetics 2022; 221:6536967. [PMID: 35212766 PMCID: PMC9071534 DOI: 10.1093/genetics/iyac034] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 02/17/2022] [Indexed: 11/13/2022] Open
Abstract
With the growing amount and diversity of intermediate omics data complementary to genomics (e.g. DNA methylation, gene expression, and protein abundance), there is a need to develop methods to incorporate intermediate omics data into conventional genomic evaluation. The omics data help decode the multiple layers of regulation from genotypes to phenotypes, thus forms a connected multilayer network naturally. We developed a new method named NN-MM to model the multiple layers of regulation from genotypes to intermediate omics features, then to phenotypes, by extending conventional linear mixed models ("MM") to multilayer artificial neural networks ("NN"). NN-MM incorporates intermediate omics features by adding middle layers between genotypes and phenotypes. Linear mixed models (e.g. pedigree-based BLUP, GBLUP, Bayesian Alphabet, single-step GBLUP, or single-step Bayesian Alphabet) can be used to sample marker effects or genetic values on intermediate omics features, and activation functions in neural networks are used to capture the nonlinear relationships between intermediate omics features and phenotypes. NN-MM had significantly better prediction performance than the recently proposed single-step approach for genomic prediction with intermediate omics data. Compared to the single-step approach, NN-MM can handle various patterns of missing omics measures and allows nonlinear relationships between intermediate omics features and phenotypes. NN-MM has been implemented in an open-source package called "JWAS".
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA,Integrative Genetics and Genomics Graduate Group, University of California Davis, Davis, CA 95616, USA
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA,Corresponding author: Department of Animal Science, University of California, Davis, CA 95616, USA.
| |
Collapse
|
20
|
Van Tassel DL, DeHaan LR, Diaz-Garcia L, Hershberger J, Rubin MJ, Schlautman B, Turner K, Miller AJ. Re-imagining crop domestication in the era of high throughput phenomics. CURRENT OPINION IN PLANT BIOLOGY 2022; 65:102150. [PMID: 34883308 DOI: 10.1016/j.pbi.2021.102150] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 10/19/2021] [Accepted: 10/25/2021] [Indexed: 06/13/2023]
Abstract
De novo domestication is an exciting option for increasing species diversity and ecosystem service functionality of agricultural landscapes. Genomic selection (GS), the application of genomic markers to predict phenotypic traits in a breeding population, offers the possibility of rapid genetic improvement, making GS especially attractive for modifying traits of long-lived species. However, for some wild species just entering the domestication pipeline, especially those with large and complex genomes, a lack of funding and/or prior genome characterization, GS is often out of reach. High throughput phenomics has the potential to augment traditional pedigree selection, reduce costs and amplify impacts of genomic selection, and even create new predictive selection approaches independent of sequencing or pedigrees.
Collapse
Affiliation(s)
| | - Lee R DeHaan
- The Land Institute, 2440 E Water Well Rd., Salina, KS, 67401, USA
| | | | - Jenna Hershberger
- The Land Institute, 2440 E Water Well Rd., Salina, KS, 67401, USA; Donald Danforth Plant Science Center, 975 North Warson Road, Saint Louis, MO, 63132, USA
| | - Matthew J Rubin
- Donald Danforth Plant Science Center, 975 North Warson Road, Saint Louis, MO, 63132, USA
| | | | - Kathryn Turner
- The Land Institute, 2440 E Water Well Rd., Salina, KS, 67401, USA
| | - Allison J Miller
- Donald Danforth Plant Science Center, 975 North Warson Road, Saint Louis, MO, 63132, USA; Saint Louis University Department of Biology, 3507 Laclede Avenue, St. Louis, MO, 63103, USA.
| |
Collapse
|
21
|
Nantongo JS, Potts BM, Frickey T, Telfer E, Dungey H, Fitzgerald H, O'Reilly-Wapstra JM. Analysis of the transcriptome of the needles and bark of Pinus radiata induced by bark stripping and methyl jasmonate. BMC Genomics 2022; 23:52. [PMID: 35026979 PMCID: PMC8759178 DOI: 10.1186/s12864-021-08231-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Plants are attacked by diverse insect and mammalian herbivores and respond with different physical and chemical defences. Transcriptional changes underlie these phenotypic changes. Simulated herbivory has been used to study the transcriptional and other early regulation events of these plant responses. In this study, constitutive and induced transcriptional responses to artificial bark stripping are compared in the needles and the bark of Pinus radiata to the responses from application of the plant stressor, methyl jasmonate. The time progression of the responses was assessed over a 4-week period. RESULTS Of the 6312 unique transcripts studied, 86.6% were differentially expressed between the needles and the bark prior to treatment. The most abundant constitutive transcripts were related to defence and photosynthesis and their expression did not differ between the needles and the bark. While no differential expression of transcripts were detected in the needles following bark stripping, in the bark this treatment caused an up-regulation and down-regulation of genes associated with primary and secondary metabolism. Methyl jasmonate treatment caused differential expression of transcripts in both the bark and the needles, with individual genes related to primary metabolism more responsive than those associated with secondary metabolism. The up-regulation of genes related to sugar break-down and the repression of genes related with photosynthesis, following both treatments was consistent with the strong down-regulation of sugars that has been observed in the same population. Relative to the control, the treatments caused a differential expression of genes involved in signalling, photosynthesis, carbohydrate and lipid metabolism as well as defence and water stress. However, non-overlapping transcripts were detected between the needles and the bark, between treatments and at different times of assessment. Methyl jasmonate induced more transcriptional responses in the bark than bark stripping, although the peak of expression following both treatments was detected 7 days post treatment application. The effects of bark stripping were localised, and no systemic changes were detected in the needles. CONCLUSION There are constitutive and induced differences in the needle and bark transcriptome of Pinus radiata. Some expression responses to bark stripping may differ from other biotic and abiotic stresses, which contributes to the understanding of plant molecular responses to diverse stresses. Whether the gene expression changes are heritable and how they differ between resistant and susceptible families identified in earlier studies needs further investigation.
Collapse
Affiliation(s)
- J S Nantongo
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia.
- National Forestry Resources Research Institute, Mukono, Uganda.
| | - B M Potts
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia
- ARC Training Centre for Forest Value, Hobart, Tasmania, Australia
| | | | | | | | - H Fitzgerald
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia
| | - J M O'Reilly-Wapstra
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia
- ARC Training Centre for Forest Value, Hobart, Tasmania, Australia
| |
Collapse
|
22
|
Martini JWR, Gao N, Crossa J. Incorporating Omics Data in Genomic Prediction. Methods Mol Biol 2022; 2467:341-357. [PMID: 35451782 DOI: 10.1007/978-1-0716-2205-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic data can be transferred to the general omics case. However, when using a framework of omic relationship matrices, the standardization of the variables may be more relevant than it is for a genomic relationship matrix based on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico.
| | - Ning Gao
- School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico
| |
Collapse
|
23
|
Shi S, Zhang Z, Li B, Zhang S, Fang L. Incorporation of Trait-Specific Genetic Information into Genomic Prediction Models. Methods Mol Biol 2022; 2467:329-340. [PMID: 35451781 DOI: 10.1007/978-1-0716-2205-6_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Due to the rapid development of high-throughput sequencing technology, we can easily obtain not only the genetic variants at the whole-genome sequence level (e.g., from 1000 Genomes project and 1000 Bull Genomes project), but also a wide range of functional annotations (e.g., enhancers and promoters from ENCODE, FAANG, and FarmGTEx projects) across a wide range of tissues, cell types, developmental stages, and environmental conditions. This huge amount of information leads to a revolution in studying genetics and genomics of complex traits in humans, livestock, and plant species. In this chapter, we focused on and reviewed the genomic prediction methods that incorporate external biological information into genomic prediction, such as sequence ontology, linkage disequilibrium (LD) of SNPs, quantitative trait loci (QTL), and multi-layer omics data (e.g., transcriptome, epigenome, and microbiome).
Collapse
Affiliation(s)
- Shaolei Shi
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhe Zhang
- Department of Animal Breeding and genetics, College of Animal Science, South China Agricultural University (SCAU), Guangzhou, China
| | - Bingjie Li
- The Roslin Institute Building, Scotland's Rural College, Edinburgh, UK
| | - Shengli Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lingzhao Fang
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
24
|
Hu H, Campbell MT, Yeats TH, Zheng X, Runcie DE, Covarrubias-Pazaran G, Broeckling C, Yao L, Caffe-Treml M, Gutiérrez LA, Smith KP, Tanaka J, Hoekenga OA, Sorrells ME, Gore MA, Jannink JL. Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021. [PMID: 34643760 DOI: 10.25739/8p1e-0931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.
Collapse
Affiliation(s)
- Haixiao Hu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Malachy T Campbell
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Trevor H Yeats
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Xuying Zheng
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | - Giovanny Covarrubias-Pazaran
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, El Batán, 56130, Texcoco, Edo. de México, México
| | - Corey Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Linxing Yao
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Melanie Caffe-Treml
- Department of Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, 57007, USA
| | - Lucı A Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Kevin P Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - James Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Owen A Hoekenga
- Cayuga Genetics Consulting Group LLC, Ithaca, NY, 14850, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, 14853, USA
| |
Collapse
|
25
|
Hu H, Campbell MT, Yeats TH, Zheng X, Runcie DE, Covarrubias-Pazaran G, Broeckling C, Yao L, Caffe-Treml M, Gutiérrez LA, Smith KP, Tanaka J, Hoekenga OA, Sorrells ME, Gore MA, Jannink JL. Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:4043-4054. [PMID: 34643760 PMCID: PMC8580906 DOI: 10.1007/s00122-021-03946-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Accepted: 09/05/2021] [Indexed: 05/26/2023]
Abstract
Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.
Collapse
Affiliation(s)
- Haixiao Hu
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Malachy T Campbell
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Trevor H Yeats
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Xuying Zheng
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | - Giovanny Covarrubias-Pazaran
- International Maize and Wheat Improvement Center (CIMMYT), Km. 45, Carretera México-Veracruz, El Batán, 56130, Texcoco, Edo. de México, México
| | - Corey Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Linxing Yao
- Proteomics and Metabolomics Facility, Colorado State University, C130 Microbiology, 2021 Campus Delivery, Fort Collins, CO, 80521, USA
| | - Melanie Caffe-Treml
- Department of Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, 57007, USA
| | - Lucı A Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Kevin P Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA
| | - James Tanaka
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Owen A Hoekenga
- Cayuga Genetics Consulting Group LLC, Ithaca, NY, 14850, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, 14853, USA
| |
Collapse
|
26
|
Hao D, Bai J, Du J, Wu X, Thomsen B, Gao H, Su G, Wang X. Overview of Metabolomic Analysis and the Integration with Multi-Omics for Economic Traits in Cattle. Metabolites 2021; 11:metabo11110753. [PMID: 34822411 PMCID: PMC8621036 DOI: 10.3390/metabo11110753] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 10/27/2021] [Accepted: 10/28/2021] [Indexed: 12/23/2022] Open
Abstract
Metabolomics has been applied to measure the dynamic metabolic responses, to understand the systematic biological networks, to reveal the potential genetic architecture, etc., for human diseases and livestock traits. For example, the current published results include the detected relevant candidate metabolites, identified metabolic pathways, potential systematic networks, etc., for different cattle traits that can be applied for further metabolomic and integrated omics studies. Therefore, summarizing the applications of metabolomics for economic traits is required in cattle. We here provide a comprehensive review about metabolomic analysis and its integration with other omics in five aspects: (1) characterization of the metabolomic profile of cattle; (2) metabolomic applications in cattle; (3) integrated metabolomic analysis with other omics; (4) methods and tools in metabolomic analysis; and (5) further potentialities. The review aims to investigate the existing metabolomic studies by highlighting the results in cattle, integrated with other omics studies, to understand the metabolic mechanisms underlying the economic traits and to provide useful information for further research and practical breeding programs in cattle.
Collapse
Affiliation(s)
- Dan Hao
- Beijing Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Beijing 100193, China; (D.H.); (J.B.); (J.D.); (X.W.)
- Shijiazhuang Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Shijiazhuang 052463, China
- Department of Molecular Biology and Genetics, Aarhus University, 8000 Aarhus, Denmark;
| | - Jiangsong Bai
- Beijing Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Beijing 100193, China; (D.H.); (J.B.); (J.D.); (X.W.)
- Shijiazhuang Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Shijiazhuang 052463, China
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China
| | - Jianyong Du
- Beijing Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Beijing 100193, China; (D.H.); (J.B.); (J.D.); (X.W.)
- Shijiazhuang Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Shijiazhuang 052463, China
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China
| | - Xiaoping Wu
- Beijing Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Beijing 100193, China; (D.H.); (J.B.); (J.D.); (X.W.)
- Shijiazhuang Zhongnongtongchuang (ZNTC) Biotechnology Co., Ltd., Shijiazhuang 052463, China
| | - Bo Thomsen
- Department of Molecular Biology and Genetics, Aarhus University, 8000 Aarhus, Denmark;
| | - Hongding Gao
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark; (H.G.); (G.S.)
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark; (H.G.); (G.S.)
| | - Xiao Wang
- Konge Larsen ApS, 2800 Kongens Lyngby, Denmark
- Correspondence:
| |
Collapse
|
27
|
Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize. PLoS Genet 2021; 17:e1009568. [PMID: 34606492 PMCID: PMC8516254 DOI: 10.1371/journal.pgen.1009568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 10/14/2021] [Accepted: 09/07/2021] [Indexed: 11/19/2022] Open
Abstract
Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations. Genomic marker data is widely used in the prediction of many traits. However, prediction has been primarily carried out within populations and without explicit modeling of RNA or protein expression. In this study, we explored the prediction of field traits within and across populations using estimated RNA expression attributable to only the DNA sequence around a gene. We showed that the estimated RNA expression was more transferable across populations and tissues than measured RNA expression. We improved prediction of field traits up to 15% using estimated gene expression as compared to observed expression or gene sequence alone. Overall, these findings indicate that structural and functional information in the gene sequence is highly transferable.
Collapse
|
28
|
Christensen OF, Börner V, Varona L, Legarra A. Genetic evaluation including intermediate omics features. Genetics 2021; 219:6345349. [PMID: 34849886 DOI: 10.1093/genetics/iyab130] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 07/13/2021] [Indexed: 11/14/2022] Open
Abstract
In animal and plant breeding and genetics, there has been an increasing interest in intermediate omics traits, such as metabolomics and transcriptomics, which mediate the effect of genetics on the phenotype of interest. For inclusion of such intermediate traits into a genetic evaluation system, there is a need for a statistical model that integrates phenotypes, genotypes, pedigree, and omics traits, and a need for associated computational methods that provide estimated breeding values. In this paper, a joint model for phenotypes and omics data is presented, and a formula for the breeding values on individuals is derived. For complete omics data, three equivalent methods for best linear unbiased prediction of breeding values are presented. In all three cases, this requires solving two mixed model equation systems. Estimation of parameters using restricted maximum likelihood is also presented. For incomplete omics data, extensions of two of these methods are presented, where in both cases, the extension consists of extending an omics-related similarity matrix to incorporate individuals without omics data. The methods are illustrated using a simulated data set.
Collapse
Affiliation(s)
- Ole F Christensen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Vinzent Börner
- Center for Quantitative Genetics and Genomics, Aarhus University, 8830 Tjele, Denmark
| | - Luis Varona
- Departmento de Anatomía, Embriología y Genética Animal, Universidad de Zaragoza, 50013 Saragoza, Spain
| | - Andres Legarra
- GenPhySE (Génétique, Physiologie et Systèmes d'Elevage), INRA, 31326 Castanet-Tolosan, France
| |
Collapse
|
29
|
Zhang T, Jiang L, Ruan L, Qian Y, Liang S, Lin F, Lu H, Dai H, Zhao H. Heterotic quantitative trait loci analysis and genomic prediction of seedling biomass-related traits in maize triple testcross populations. PLANT METHODS 2021; 17:85. [PMID: 34330310 PMCID: PMC8325263 DOI: 10.1186/s13007-021-00785-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 07/23/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Heterosis has been widely used in maize breeding. However, we know little about the heterotic quantitative trait loci and their roles in genomic prediction. In this study, we sought to identify heterotic quantitative trait loci for seedling biomass-related traits using triple testcross design and compare their prediction accuracies by fitting molecular markers and heterotic quantitative trait loci. RESULTS A triple testcross population comprised of 366 genotypes was constructed by crossing each of 122 intermated B73 × Mo17 genotypes with B73, Mo17, and B73 × Mo17. The mid-parent heterosis of seedling biomass-related traits involved in leaf length, leaf width, leaf area, and seedling dry weight displayed a large range, from less than 50 to ~ 150%. Relationships between heterosis of seedling biomass-related traits showed congruency with that between performances. Based on a linkage map comprised of 1631 markers, 14 augmented additive, two augmented dominance, and three dominance × additive epistatic quantitative trait loci for heterosis of seedling biomass-related traits were identified, with each individually explaining 4.1-20.5% of the phenotypic variation. All modes of gene action, i.e., additive, partially dominant, dominant, and overdominant modes were observed. In addition, ten additive × additive and six dominance × dominance epistatic interactions were identified. By implementing the general and special combining ability model, we found that prediction accuracy ranged from 0.29 for leaf length to 0.56 for leaf width. Different number of marker analysis showed that ~ 800 markers almost capture the largest prediction accuracies. When incorporating the heterotic quantitative trait loci into the model, we did not find the significant change of prediction accuracy, with only leaf length showing the marginal improvement by 1.7%. CONCLUSIONS Our results demonstrated that the triple testcross design is suitable for detecting heterotic quantitative trait loci and evaluating the prediction accuracy. Seedling leaf width can be used as the representative trait for seedling prediction. The heterotic quantitative trait loci are not necessary for genomic prediction of seedling biomass-related traits.
Collapse
Affiliation(s)
- Tifu Zhang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Lu Jiang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Long Ruan
- Institute of Tobacco, Anhui Academy of Agricultural Sciences, Hefei, 230001, China
| | - Yiliang Qian
- Institute of Tobacco, Anhui Academy of Agricultural Sciences, Hefei, 230001, China
| | - Shuaiqiang Liang
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Feng Lin
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Haiyan Lu
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China
| | - Huixue Dai
- Nanjing Institute of Vegetable Sciences, Nanjing, 210042, China
| | - Han Zhao
- Jiangsu Provincial Key Laboratory of Agrobiology, Institute of Germplasm Resources and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, 210014, China.
| |
Collapse
|
30
|
Pazhamala LT, Kudapa H, Weckwerth W, Millar AH, Varshney RK. Systems biology for crop improvement. THE PLANT GENOME 2021; 14:e20098. [PMID: 33949787 DOI: 10.1002/tpg2.20098] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 03/09/2021] [Indexed: 05/19/2023]
Abstract
In recent years, generation of large-scale data from genome, transcriptome, proteome, metabolome, epigenome, and others, has become routine in several plant species. Most of these datasets in different crop species, however, were studied independently and as a result, full insight could not be gained on the molecular basis of complex traits and biological networks. A systems biology approach involving integration of multiple omics data, modeling, and prediction of the cellular functions is required to understand the flow of biological information that underlies complex traits. In this context, systems biology with multiomics data integration is crucial and allows a holistic understanding of the dynamic system with the different levels of biological organization interacting with external environment for a phenotypic expression. Here, we present recent progress made in the area of various omics studies-integrative and systems biology approaches with a special focus on application to crop improvement. We have also discussed the challenges and opportunities in multiomics data integration, modeling, and understanding of the biology of complex traits underpinning yield and stress tolerance in major cereals and legumes.
Collapse
Affiliation(s)
- Lekha T Pazhamala
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Himabindu Kudapa
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria
- Vienna Metabolomics Center, University of Vienna, Vienna, Austria
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology and School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
- State Agricultural Biotechnology Centre, Crop Research Innovation Centre, Food Futures Institute, Murdoch University, Murdoch, WA, Australia
| |
Collapse
|
31
|
Rice BR, Lipka AE. Diversifying maize genomic selection models. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:33. [PMID: 37309328 PMCID: PMC10236107 DOI: 10.1007/s11032-021-01221-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/07/2021] [Indexed: 06/14/2023]
Abstract
Genomic selection (GS) is one of the most powerful tools available for maize breeding. Its use of genome-wide marker data to estimate breeding values translates to increased genetic gains with fewer breeding cycles. In this review, we cover the history of GS and highlight particular milestones during its adaptation to maize breeding. We discuss how GS can be applied to developing superior maize inbreds and hybrids. Additionally, we characterize refinements in GS models that could enable the encapsulation of non-additive genetic effects, genotype by environment interactions, and multiple levels of the biological hierarchy, all of which could ultimately result in more accurate predictions of breeding values. Finally, we suggest the stages in a maize breeding program where it would be beneficial to apply GS. Given the current sophistication of high-throughput phenotypic, genotypic, and other -omic level data currently available to the maize community, now is the time to explore the implications of their incorporation into GS models and thus ensure that genetic gains are being achieved as quickly and efficiently as possible.
Collapse
Affiliation(s)
- Brian R. Rice
- Department of Crop Sciences, University of Illinois, Urbana, IL USA
| | | |
Collapse
|
32
|
Knoch D, Werner CR, Meyer RC, Riewe D, Abbadi A, Lücke S, Snowdon RJ, Altmann T. Multi-omics-based prediction of hybrid performance in canola. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1147-1165. [PMID: 33523261 PMCID: PMC7973648 DOI: 10.1007/s00122-020-03759-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 12/19/2020] [Indexed: 05/05/2023]
Abstract
Complementing or replacing genetic markers with transcriptomic data and use of reproducing kernel Hilbert space regression based on Gaussian kernels increases hybrid prediction accuracies for complex agronomic traits in canola. In plant breeding, hybrids gained particular importance due to heterosis, the superior performance of offspring compared to their inbred parents. Since the development of new top performing hybrids requires labour-intensive and costly breeding programmes, including testing of large numbers of experimental hybrids, the prediction of hybrid performance is of utmost interest to plant breeders. In this study, we tested the effectiveness of hybrid prediction models in spring-type oilseed rape (Brassica napus L./canola) employing different omics profiles, individually and in combination. To this end, a population of 950 F1 hybrids was evaluated for seed yield and six other agronomically relevant traits in commercial field trials at several locations throughout Europe. A subset of these hybrids was also evaluated in a climatized glasshouse regarding early biomass production. For each of the 477 parental rapeseed lines, 13,201 single nucleotide polymorphisms (SNPs), 154 primary metabolites, and 19,479 transcripts were determined and used as predictive variables. Both, SNP markers and transcripts, effectively predict hybrid performance using (genomic) best linear unbiased prediction models (gBLUP). Compared to models using pure genetic markers, models incorporating transcriptome data resulted in significantly higher prediction accuracies for five out of seven agronomic traits, indicating that transcripts carry important information beyond genomic data. Notably, reproducing kernel Hilbert space regression based on Gaussian kernels significantly exceeded the predictive abilities of gBLUP models for six of the seven agronomic traits, demonstrating its potential for implementation in future canola breeding programmes.
Collapse
Affiliation(s)
- Dominic Knoch
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466 Seeland, OT Gatersleben Germany
| | - Christian R. Werner
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG Scotland, UK
| | - Rhonda C. Meyer
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466 Seeland, OT Gatersleben Germany
| | - David Riewe
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466 Seeland, OT Gatersleben Germany
- Institute for Ecological Chemistry, Plant Analysis and Stored Product Protection, Julius Kühn Institute (JKI)—Federal Research Centre for Cultivated Plants, 14195 Berlin, Germany
| | - Amine Abbadi
- NPZ Innovation GmbH, Hohenlieth, 24363 Holtsee, Germany
- Norddeutsche Pflanzenzucht Hans-Georg Lembke KG, Hohenlieth, 24363 Holtsee, Germany
| | - Sophie Lücke
- Norddeutsche Pflanzenzucht Hans-Georg Lembke KG, Hohenlieth, 24363 Holtsee, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany
| | - Thomas Altmann
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466 Seeland, OT Gatersleben Germany
| |
Collapse
|
33
|
Campbell MT, Hu H, Yeats TH, Brzozowski LJ, Caffe-Treml M, Gutiérrez L, Smith KP, Sorrells ME, Gore MA, Jannink JL. Improving Genomic Prediction for Seed Quality Traits in Oat (Avena sativa L.) Using Trait-Specific Relationship Matrices. Front Genet 2021; 12:643733. [PMID: 33868378 PMCID: PMC8044359 DOI: 10.3389/fgene.2021.643733] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/04/2021] [Indexed: 11/13/2022] Open
Abstract
The observable phenotype is the manifestation of information that is passed along different organization levels (transcriptional, translational, and metabolic) of a biological system. The widespread use of various omic technologies (RNA-sequencing, metabolomics, etc.) has provided plant genetics and breeders with a wealth of information on pertinent intermediate molecular processes that may help explain variation in conventional traits such as yield, seed quality, and fitness, among others. A major challenge is effectively using these data to help predict the genetic merit of new, unobserved individuals for conventional agronomic traits. Trait-specific genomic relationship matrices (TGRMs) model the relationships between individuals using genome-wide markers (SNPs) and place greater emphasis on markers that most relevant to the trait compared to conventional genomic relationship matrices. Given that these approaches define relationships based on putative causal loci, it is expected that these approaches should improve predictions for related traits. In this study we evaluated the use of TGRMs to accommodate information on intermediate molecular phenotypes (referred to as endophenotypes) and to predict an agronomic trait, total lipid content, in oat seed. Nine fatty acids were quantified in a panel of 336 oat lines. Marker effects were estimated for each endophenotype, and were used to construct TGRMs. A multikernel TRGM model (MK-TRGM-BLUP) was used to predict total seed lipid content in an independent panel of 210 oat lines. The MK-TRGM-BLUP approach significantly improved predictions for total lipid content when compared to a conventional genomic BLUP (gBLUP) approach. Given that the MK-TGRM-BLUP approach leverages information on the nine fatty acids to predict genetic values for total lipid content in unobserved individuals, we compared the MK-TGRM-BLUP approach to a multi-trait gBLUP (MT-gBLUP) approach that jointly fits phenotypes for fatty acids and total lipid content. The MK-TGRM-BLUP approach significantly outperformed MT-gBLUP. Collectively, these results highlight the utility of using TGRM to accommodate information on endophenotypes and improve genomic prediction for a conventional agronomic trait.
Collapse
Affiliation(s)
- Malachy T. Campbell
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Haixiao Hu
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Trevor H. Yeats
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Lauren J. Brzozowski
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Melanie Caffe-Treml
- Seed Technology Lab 113, Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, United States
| | - Lucía Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, United States
| | - Kevin P. Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, United States
| | - Mark E. Sorrells
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Michael A. Gore
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jean-Luc Jannink
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
- R.W. Holley Center for Agriculture & Health, US Department of Agriculture, Agricultural Research Service, Ithaca, NY, United States
| |
Collapse
|
34
|
Campbell MT, Hu H, Yeats TH, Caffe-Treml M, Gutiérrez L, Smith KP, Sorrells ME, Gore MA, Jannink JL. Translating insights from the seed metabolome into improved prediction for lipid-composition traits in oat (Avena sativa L.). Genetics 2021; 217:iyaa043. [PMID: 33789350 PMCID: PMC8045723 DOI: 10.1093/genetics/iyaa043] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 12/08/2020] [Indexed: 12/13/2022] Open
Abstract
Oat (Avena sativa L.) seed is a rich resource of beneficial lipids, soluble fiber, protein, and antioxidants, and is considered a healthful food for humans. Little is known regarding the genetic controllers of variation for these compounds in oat seed. We characterized natural variation in the mature seed metabolome using untargeted metabolomics on 367 diverse lines and leveraged this information to improve prediction for seed quality traits. We used a latent factor approach to define unobserved variables that may drive covariance among metabolites. One hundred latent factors were identified, of which 21% were enriched for compounds associated with lipid metabolism. Through a combination of whole-genome regression and association mapping, we show that latent factors that generate covariance for many metabolites tend to have a complex genetic architecture. Nonetheless, we recovered significant associations for 23% of the latent factors. These associations were used to inform a multi-kernel genomic prediction model, which was used to predict seed lipid and protein traits in two independent studies. Predictions for 8 of the 12 traits were significantly improved compared to genomic best linear unbiased prediction when this prediction model was informed using associations from lipid-enriched factors. This study provides new insights into variation in the oat seed metabolome and provides genomic resources for breeders to improve selection for health-promoting seed quality traits. More broadly, we outline an approach to distill high-dimensional "omics" data to a set of biologically meaningful variables and translate inferences on these data into improved breeding decisions.
Collapse
Affiliation(s)
- Malachy T Campbell
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Haixiao Hu
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Trevor H Yeats
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Melanie Caffe-Treml
- Department of Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD 57007, USA
| | - Lucía Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Kevin P Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Mark E Sorrells
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Michael A Gore
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jean-Luc Jannink
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- R.W. Holley Center for Agriculture & Health US Department of Agriculture, Agricultural Research Service, Ithaca, NY 14853, USA
| |
Collapse
|
35
|
Gonçalves MTV, Morota G, Costa PMDA, Vidigal PMP, Barbosa MHP, Peternelli LA. Near-infrared spectroscopy outperforms genomics for predicting sugarcane feedstock quality traits. PLoS One 2021; 16:e0236853. [PMID: 33661948 PMCID: PMC7932073 DOI: 10.1371/journal.pone.0236853] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 01/20/2021] [Indexed: 11/19/2022] Open
Abstract
The main objectives of this study were to evaluate the prediction performance of genomic and near-infrared spectroscopy (NIR) data and whether the integration of genomic and NIR predictor variables can increase the prediction accuracy of two feedstock quality traits (fiber and sucrose content) in a sugarcane population (Saccharum spp.). The following three modeling strategies were compared: M1 (genome-based prediction), M2 (NIR-based prediction), and M3 (integration of genomics and NIR wavenumbers). Data were collected from a commercial population comprised of three hundred and eighty-five individuals, genotyped for single nucleotide polymorphisms and screened using NIR spectroscopy. We compared partial least squares (PLS) and BayesB regression methods to estimate marker and wavenumber effects. In order to assess model performance, we employed random sub-sampling cross-validation to calculate the mean Pearson correlation coefficient between observed and predicted values. Our results showed that models fitted using BayesB were more predictive than PLS models. We found that NIR (M2) provided the highest prediction accuracy, whereas genomics (M1) presented the lowest predictive ability, regardless of the measured traits and regression methods used. The integration of predictors derived from NIR spectroscopy and genomics into a single model (M3) did not significantly improve the prediction accuracy for the two traits evaluated. These findings suggest that NIR-based prediction can be an effective strategy for predicting the genetic merit of sugarcane clones.
Collapse
Affiliation(s)
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
| | | | | | | | | |
Collapse
|
36
|
Tong H, Nikoloski Z. Machine learning approaches for crop improvement: Leveraging phenotypic and genotypic big data. JOURNAL OF PLANT PHYSIOLOGY 2021; 257:153354. [PMID: 33385619 DOI: 10.1016/j.jplph.2020.153354] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/14/2020] [Accepted: 12/15/2020] [Indexed: 05/07/2023]
Abstract
Highly efficient and accurate selection of elite genotypes can lead to dramatic shortening of the breeding cycle in major crops relevant for sustaining present demands for food, feed, and fuel. In contrast to classical approaches that emphasize the need for resource-intensive phenotyping at all stages of artificial selection, genomic selection dramatically reduces the need for phenotyping. Genomic selection relies on advances in machine learning and the availability of genotyping data to predict agronomically relevant phenotypic traits. Here we provide a systematic review of machine learning approaches applied for genomic selection of single and multiple traits in major crops in the past decade. We emphasize the need to gather data on intermediate phenotypes, e.g. metabolite, protein, and gene expression levels, along with developments of modeling techniques that can lead to further improvements of genomic selection. In addition, we provide a critical view of factors that affect genomic selection, with attention to transferability of models between different environments. Finally, we highlight the future aspects of integrating high-throughput molecular phenotypic data from omics technologies with biological networks for crop improvement.
Collapse
Affiliation(s)
- Hao Tong
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany; Bioinformatics and Mathematical Modeling Department, Centre for Plant Systems Biology and Biotechnology, Plovdiv, Bulgaria; Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| | - Zoran Nikoloski
- Bioinformatics Group, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany; Bioinformatics and Mathematical Modeling Department, Centre for Plant Systems Biology and Biotechnology, Plovdiv, Bulgaria; Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany.
| |
Collapse
|
37
|
Xu Y, Zhao Y, Wang X, Ma Y, Li P, Yang Z, Zhang X, Xu C, Xu S. Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:261-272. [PMID: 32738177 PMCID: PMC7868986 DOI: 10.1111/pbi.13458] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 06/14/2020] [Accepted: 07/22/2020] [Indexed: 05/15/2023]
Abstract
Hybrid breeding has been shown to effectively increase rice productivity. However, identifying desirable hybrids out of numerous potential combinations is a daunting challenge. Genomic selection holds great promise for accelerating hybrid breeding by enabling early selection before phenotypes are measured. With the recent advances in multi-omic technologies, hybrid prediction based on transcriptomic and metabolomic data has received increasing attention. However, the current omic-based hybrid prediction has ignored parental phenotypic information, which is of fundamental importance in plant breeding. In this study, we integrated parental phenotypic information into various multi-omic prediction models applied in hybrid breeding of rice and compared the predictabilities of 15 combinations from four sets of predictors from the parents, that is genome, transcriptome, metabolome and phenome. The predictability for each combination was evaluated using the best linear unbiased prediction and a modified fast HAT method. We found significant interactions between predictors and traits in predictability, but joint prediction with various combinations of the predictors significantly improved predictability relative to prediction of any single source omic data for each trait investigated. Incorporation of parental phenotypic data into various omic predictors increased the predictability, averagely by 13.6%, 54.5%, 19.9% and 8.3%, for grain yield, number of tillers per plant, number of grains per panicle and 1000 grain weight, respectively. Among nine models of incorporating parental traits, the AD-All model was the most effective one. This novel strategy of incorporating parental phenotypic data into multi-omic prediction is expected to improve hybrid breeding progress, especially with the development of high-throughput phenotyping technologies.
Collapse
Affiliation(s)
- Yang Xu
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Yue Zhao
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Xin Wang
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Ying Ma
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Pengcheng Li
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Zefeng Yang
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT)MexicoDFMexico
| | - Chenwu Xu
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Shizhong Xu
- Department of Botany and Plant SciencesUniversity of CaliforniaRiversideCAUSA
| |
Collapse
|
38
|
Farooq M, van Dijk ADJ, Nijveen H, Aarts MGM, Kruijer W, Nguyen TP, Mansoor S, de Ridder D. Prior Biological Knowledge Improves Genomic Prediction of Growth-Related Traits in Arabidopsis thaliana. Front Genet 2021; 11:609117. [PMID: 33552126 PMCID: PMC7855462 DOI: 10.3389/fgene.2020.609117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/21/2020] [Indexed: 01/11/2023] Open
Abstract
Prediction of growth-related complex traits is highly important for crop breeding. Photosynthesis efficiency and biomass are direct indicators of overall plant performance and therefore even minor improvements in these traits can result in significant breeding gains. Crop breeding for complex traits has been revolutionized by technological developments in genomics and phenomics. Capitalizing on the growing availability of genomics data, genome-wide marker-based prediction models allow for efficient selection of the best parents for the next generation without the need for phenotypic information. Until now such models mostly predict the phenotype directly from the genotype and fail to make use of relevant biological knowledge. It is an open question to what extent the use of such biological knowledge is beneficial for improving genomic prediction accuracy and reliability. In this study, we explored the use of publicly available biological information for genomic prediction of photosynthetic light use efficiency (Φ PSII ) and projected leaf area (PLA) in Arabidopsis thaliana. To explore the use of various types of knowledge, we mapped genomic polymorphisms to Gene Ontology (GO) terms and transcriptomics-based gene clusters, and applied these in a Genomic Feature Best Linear Unbiased Predictor (GFBLUP) model, which is an extension to the traditional Genomic BLUP (GBLUP) benchmark. Our results suggest that incorporation of prior biological knowledge can improve genomic prediction accuracy for both Φ PSII and PLA. The improvement achieved depends on the trait, type of knowledge and trait heritability. Moreover, transcriptomics offers complementary evidence to the Gene Ontology for improvement when used to define functional groups of genes. In conclusion, prior knowledge about trait-specific groups of genes can be directly translated into improved genomic prediction.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Harm Nijveen
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| | - Mark G. M. Aarts
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Willem Kruijer
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Thu-Phuong Nguyen
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| |
Collapse
|
39
|
Michel S, Wagner C, Nosenko T, Steiner B, Samad-Zamini M, Buerstmayr M, Mayer K, Buerstmayr H. Merging Genomics and Transcriptomics for Predicting Fusarium Head Blight Resistance in Wheat. Genes (Basel) 2021; 12:114. [PMID: 33477759 PMCID: PMC7832326 DOI: 10.3390/genes12010114] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/14/2021] [Accepted: 01/16/2021] [Indexed: 01/13/2023] Open
Abstract
Genomic selection with genome-wide distributed molecular markers has evolved into a well-implemented tool in many breeding programs during the last decade. The resistance against Fusarium head blight (FHB) in wheat is probably one of the most thoroughly studied systems within this framework. Aside from the genome, other biological strata like the transcriptome have likewise shown some potential in predictive breeding strategies but have not yet been investigated for the FHB-wheat pathosystem. The aims of this study were thus to compare the potential of genomic with transcriptomic prediction, and to assess the merit of blending incomplete transcriptomic with complete genomic data by the single-step method. A substantial advantage of gene expression data over molecular markers has been observed for the prediction of FHB resistance in the studied diversity panel of breeding lines and released cultivars. An increase in prediction ability was likewise found for the single-step predictions, although this can mostly be attributed to an increased accuracy among the RNA-sequenced genotypes. The usage of transcriptomics can thus be seen as a complement to already established predictive breeding pipelines with pedigree and genomic data, particularly when more cost-efficient multiplexing techniques for RNA-sequencing will become more accessible in the future.
Collapse
Affiliation(s)
- Sebastian Michel
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Christian Wagner
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Tetyana Nosenko
- PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; (T.N.); (K.M.)
- Research Unit Environmental Simulation (EUS) at the Institute of Biochemical Plant Pathology (BIOP), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Barbara Steiner
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Mina Samad-Zamini
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
- Saatzucht Edelhof GmbH, 3910 Zwettl, Austria
| | - Maria Buerstmayr
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| | - Klaus Mayer
- PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; (T.N.); (K.M.)
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production (IFA-Tulln), University of Natural Resources and Life Sciences Vienna, 3430 Tulln, Austria; (C.W.); (B.S.); (M.S.-Z.); (M.B.); (H.B.)
| |
Collapse
|
40
|
Morgante F, Huang W, Sørensen P, Maltecca C, Mackay TFC. Leveraging Multiple Layers of Data To Predict Drosophila Complex Traits. G3 (BETHESDA, MD.) 2020; 10:4599-4613. [PMID: 33106232 PMCID: PMC7718734 DOI: 10.1534/g3.120.401847] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 10/12/2020] [Indexed: 02/07/2023]
Abstract
The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.
Collapse
Affiliation(s)
- Fabio Morgante
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
| | - Wen Huang
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
| | - Peter Sørensen
- Center of Quantitative Genetics and Genomics and Department of Molecular Biology and Genetics, Aarhus University, Tjele 8830, Denmark
| | - Christian Maltecca
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695
| | - Trudy F C Mackay
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
| |
Collapse
|
41
|
Cruz DF, De Meyer S, Ampe J, Sprenger H, Herman D, Van Hautegem T, De Block J, Inzé D, Nelissen H, Maere S. Using single-plant-omics in the field to link maize genes to functions and phenotypes. Mol Syst Biol 2020; 16:e9667. [PMID: 33346944 PMCID: PMC7751767 DOI: 10.15252/msb.20209667] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 10/29/2020] [Accepted: 11/17/2020] [Indexed: 12/14/2022] Open
Abstract
Most of our current knowledge on plant molecular biology is based on experiments in controlled laboratory environments. However, translating this knowledge from the laboratory to the field is often not straightforward, in part because field growth conditions are very different from laboratory conditions. Here, we test a new experimental design to unravel the molecular wiring of plants and study gene-phenotype relationships directly in the field. We molecularly profiled a set of individual maize plants of the same inbred background grown in the same field and used the resulting data to predict the phenotypes of individual plants and the function of maize genes. We show that the field transcriptomes of individual plants contain as much information on maize gene function as traditional laboratory-generated transcriptomes of pooled plant samples subject to controlled perturbations. Moreover, we show that field-generated transcriptome and metabolome data can be used to quantitatively predict individual plant phenotypes. Our results show that profiling individual plants in the field is a promising experimental design that could help narrow the lab-field gap.
Collapse
Affiliation(s)
- Daniel Felipe Cruz
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Sam De Meyer
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Joke Ampe
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Heike Sprenger
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Dorota Herman
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Tom Van Hautegem
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Jolien De Block
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Dirk Inzé
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Hilde Nelissen
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| | - Steven Maere
- Department of Plant Biotechnology and BioinformaticsGhent UniversityGhentBelgium
- VIB Center for Plant Systems BiologyGhentBelgium
| |
Collapse
|
42
|
Ye S, Li J, Zhang Z. Multi-omics-data-assisted genomic feature markers preselection improves the accuracy of genomic prediction. J Anim Sci Biotechnol 2020; 11:109. [PMID: 33292577 PMCID: PMC7708144 DOI: 10.1186/s40104-020-00515-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 09/22/2020] [Indexed: 12/02/2022] Open
Abstract
Background Presently, multi-omics data (e.g., genomics, transcriptomics, proteomics, and metabolomics) are available to improve genomic predictors. Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level. Therefore, using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction. In this study, simultaneously using whole-genome sequencing (WGS) and gene expression level data, four strategies for single-nucleotide polymorphism (SNP) preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel. Results Using genomic best linear unbiased prediction (GBLUP) with complete WGS data, the prediction accuracies were 0.208 ± 0.020 (0.181 ± 0.022) for the startle response and 0.272 ± 0.017 (0.307 ± 0.015) for starvation resistance in the female (male) lines. Compared with GBLUP using complete WGS data, both GBLUP and the genomic feature BLUP (GFBLUP) did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies (GWASs) or transcriptome-wide association studies (TWASs). Furthermore, by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus (eQTL) mapping of all genes, only the startle response had greater accuracy than GBLUP with the complete WGS data. The best accuracy values in the female and male lines were 0.243 ± 0.020 and 0.220 ± 0.022, respectively. Importantly, by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS, both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction. Compared with the GBLUP using complete WGS data, the best accuracy values represented increases of 60.66% and 39.09% for the starvation resistance and 27.40% and 35.36% for startle response in the female and male lines, respectively. Conclusions Overall, multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction. The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, China.
| |
Collapse
|
43
|
Ferrão LFV, Johnson TS, Benevenuto J, Edger PP, Colquhoun TA, Munoz PR. Genome-wide association of volatiles reveals candidate loci for blueberry flavor. THE NEW PHYTOLOGIST 2020; 226:1725-1737. [PMID: 31999829 DOI: 10.1111/nph.16459] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 01/21/2020] [Indexed: 05/20/2023]
Abstract
Plants produce a range of volatile organic compounds (VOCs), some of which are perceived by the human olfactory system, contributing to a myriad flavors. Despite the importance of flavor for consumer preference, most plant breeding programs have neglected it, mainly because of the costs of phenotyping and the complexity of disentangling the role of VOCs in human perception. To develop molecular breeding tools aimed at improving fruit flavor, we carried out target genotyping of and VOC extraction from a blueberry population. Metabolite genome-wide association analysis was used to elucidate the genetic architecture, while predictive models were tested to prove that VOCs can be accurately predicted using genomic information. A historical sensory panel was considered to assess how the volatiles influenced consumers. By gathering genomics, metabolomics, and the sensory panel, we demonstrated that VOCs are controlled by a few major genomic regions, some of which harbor biosynthetic enzyme-coding genes; can be accurately predicted using molecular markers; and can enhance or decrease consumers' overall liking. Here we emphasized how the understanding of the genetic basis and the role of VOCs in consumer preference can assist breeders in developing more flavorful cultivars at a more inexpensive and accelerated pace.
Collapse
Affiliation(s)
- Luís Felipe V Ferrão
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA
| | - Timothy S Johnson
- Environmental Horticulture Department, Plant Innovation Center, University of Florida, Gainesville, FL, 32611, USA
| | - Juliana Benevenuto
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA
| | - Patrick P Edger
- Department of Horticulture, University of Michigan, Michigan State University, East Lansing, MI, 48824, USA
| | - Thomas A Colquhoun
- Environmental Horticulture Department, Plant Innovation Center, University of Florida, Gainesville, FL, 32611, USA
| | - Patricio R Munoz
- Blueberry Breeding and Genomics Lab, Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA
| |
Collapse
|
44
|
Tsai HY, Cericola F, Edriss V, Andersen JR, Orabi J, Jensen JD, Jahoor A, Janss L, Jensen J. Use of multiple traits genomic prediction, genotype by environment interactions and spatial effect to improve prediction accuracy in yield data. PLoS One 2020; 15:e0232665. [PMID: 32401769 PMCID: PMC7219756 DOI: 10.1371/journal.pone.0232665] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 04/20/2020] [Indexed: 11/24/2022] Open
Abstract
Genomic selection has been extensively implemented in plant breeding schemes. Genomic selection incorporates dense genome-wide markers to predict the breeding values for important traits based on information from genotype and phenotype records on traits of interest in a reference population. To date, most relevant investigations have been performed using single trait genomic prediction models (STGP). However, records for several traits at once are usually documented for breeding lines in commercial breeding programs. By incorporating benefits from genetic characterizations of correlated phenotypes, multiple trait genomic prediction (MTGP) may be a useful tool for improving prediction accuracy in genetic evaluations. The objective of this study was to test whether the use of MTGP and including proper modeling of spatial effects can improve the prediction accuracy of breeding values in commercial barley and wheat breeding lines. We genotyped 1,317 spring barley and 1,325 winter wheat lines from a commercial breeding program with the Illumina 9K barley and 15K wheat SNP-chip (respectively) and phenotyped them across multiple years and locations. Results showed that the MTGP approach increased correlations between future performance and estimated breeding value of yields by 7% in barley and by 57% in wheat relative to using the STGP approach for each trait individually. Analyses combining genomic data, pedigree information, and proper modeling of spatial effects further increased the prediction accuracy by 4% in barley and 3% in wheat relative to the model using genomic relationships only. The prediction accuracy for yield in wheat and barley yield trait breeding, were improved by combining MTGP and spatial effects in the model.
Collapse
Affiliation(s)
- Hsin-Yuan Tsai
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
- Department of Marine Biotechnology and Resources, National Sun Yat-Sen University, Kaohsiung, Taiwan
- * E-mail:
| | | | | | | | | | | | - Ahmed Jahoor
- Nordic Seed, Galten, Denmark
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - Luc Janss
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Just Jensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| |
Collapse
|
45
|
Zhou S, Morgante F, Geisz MS, Ma J, Anholt RRH, Mackay TFC. Systems genetics of the Drosophila metabolome. Genome Res 2020; 30:392-405. [PMID: 31694867 PMCID: PMC7111526 DOI: 10.1101/gr.243030.118] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 03/11/2019] [Indexed: 02/06/2023]
Abstract
How effects of DNA sequence variants are transmitted through intermediate endophenotypes to modulate organismal traits remains a central question in quantitative genetics. This problem can be addressed through a systems approach in a population in which genetic polymorphisms, gene expression traits, metabolites, and complex phenotypes can be evaluated on the same genotypes. Here, we focused on the metabolome, which represents the most proximal link between genetic variation and organismal phenotype, and quantified metabolite levels in 40 lines of the Drosophila melanogaster Genetic Reference Panel. We identified sex-specific modules of genetically correlated metabolites and constructed networks that integrate DNA sequence variation and variation in gene expression with variation in metabolites and organismal traits, including starvation stress resistance and male aggression. Finally, we asked to what extent SNPs and metabolites can predict trait phenotypes and generated trait- and sex-specific prediction models that provide novel insights about the metabolomic underpinnings of complex phenotypes.
Collapse
Affiliation(s)
- Shanshan Zhou
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Fabio Morgante
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Matthew S Geisz
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Junwu Ma
- Key Laboratory for Animal Biotechnology of Jiangxi Province and the Ministry of Agriculture of China, JiangXi Agricultural University, JiangXi, China
| | - Robert R H Anholt
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, USA
| | - Trudy F C Mackay
- Program in Genetics, W.M. Keck Center for Behavioral Biology and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, USA
| |
Collapse
|
46
|
Dan Z, Chen Y, Xu Y, Huang J, Huang J, Hu J, Yao G, Zhu Y, Huang W. A metabolome-based core hybridisation strategy for the prediction of rice grain weight across environments. PLANT BIOTECHNOLOGY JOURNAL 2019; 17:906-913. [PMID: 30321482 PMCID: PMC6587747 DOI: 10.1111/pbi.13024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 08/21/2018] [Accepted: 10/10/2018] [Indexed: 05/05/2023]
Abstract
Marker-based prediction holds great promise for improving current plant and animal breeding efficiencies. However, the predictabilities of complex traits are always severely affected by negative factors, including distant relatedness, environmental discrepancies, unknown population structures, and indeterminate numbers of predictive variables. In this study, we utilised two independent F1 hybrid populations in the years 2012 and 2015 to predict rice thousand grain weight (TGW) using parental untargeted metabolite profiles with a partial least squares regression method. A stable predictive model for TGW was built based on hybrids from the population in 2012 (r = 0.75) but failed to properly predict TGW for hybrids from the population in 2015 (r = 0.27). After integrating hybrids from both populations into the training set, the TGW of hybrids could be predicted but was largely dependent on population structures. Then, core hybrids from each population were determined by principal component analysis and the TGW of hybrids in both environments were successfully predicted (r > 0.60). Moreover, adjusting the population structures and numbers of predictive analytes increased TGW predictability for hybrids in 2015 (r = 0.72). Our study demonstrates that the TGW of F1 hybrids across environments can be accurately predicted based on parental untargeted metabolite profiles with a core hybridisation strategy in rice. Metabolic biomarkers identified from early developmental stage tissues, which are grown under experimental conditions, may represent a workable approach towards the robust prediction of major agronomic traits for climate-adaptive varieties.
Collapse
Affiliation(s)
- Zhiwu Dan
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Yunping Chen
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Yanghong Xu
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Junran Huang
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Jishuai Huang
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Jun Hu
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Guoxin Yao
- School of Life and Science TechnologyHubei Engineering UniversityXiaoganChina
| | - Yingguo Zhu
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| | - Wenchao Huang
- State Key Laboratory of Hybrid RiceKey Laboratory for Research and Utilization of Heterosis in Indica RiceThe Yangtze River Valley Hybrid Rice Collaboration & Innovation CenterCollege of Life SciencesWuhan UniversityWuhanChina
| |
Collapse
|
47
|
Westhues M, Heuer C, Thaller G, Fernando R, Melchinger AE. Efficient genetic value prediction using incomplete omics data. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1211-1222. [PMID: 30656353 DOI: 10.1007/s00122-018-03273-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 12/21/2018] [Indexed: 05/05/2023]
Abstract
Covering a subset of individuals with a quantitative predictor, while imputing records for all others using pedigree or genomic data, could improve the precision of predictions while controlling for costs. Predicting genetic values with high accuracy is pivotal for effective candidate selection in animal and plant breeding. Novel 'omics'-based predictors have been shown to improve upon established genome-based predictions of important complex traits but require laborious and expensive assays. As a consequence, there are various datasets with full genetic marker coverage of all studied individuals but incomplete coverage with other 'omics' data. In animal breeding, single-step prediction was introduced to efficiently combine pedigree information, collected on a large number of animals, with genomic information, collected on a smaller subset of animals, for breeding value estimation without bias. Using two maize datasets of inbred lines and hybrids, we show that the single-step framework facilitates imputing transcriptomic data, boosting forecasts when their predictive ability exceeds that of pedigree or genomic data. Our results suggest that covering only a subset of inbred lines with 'omics' predictors and imputing all others using pedigree or genomic data could enable breeders to improve trait predictions while keeping costs under control. Employing 'omics' predictors could particularly improve candidate selection in hybrid breeding because the success of forecasts is a strongly convex function of predictive ability.
Collapse
Affiliation(s)
- Matthias Westhues
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany
| | - Claas Heuer
- Institute of Animal Breeding and Husbandry, Christian-Albrechts-University Kiel, 24098, Kiel, Germany
- Inguran, LLC dba STGenetics, 22575 SH6 South, Navasota, TX, 77868, USA
| | - Georg Thaller
- Institute of Animal Breeding and Husbandry, Christian-Albrechts-University Kiel, 24098, Kiel, Germany
| | - Rohan Fernando
- Department of Animal Science, Iowa State University, Ames, IA, 50011, USA
| | - Albrecht E Melchinger
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599, Stuttgart, Germany.
| |
Collapse
|
48
|
Li Z, Gao N, Martini JWR, Simianer H. Integrating Gene Expression Data Into Genomic Prediction. Front Genet 2019; 10:126. [PMID: 30858865 PMCID: PMC6397893 DOI: 10.3389/fgene.2019.00126] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 02/04/2019] [Indexed: 01/14/2023] Open
Abstract
Gene expression profiles potentially hold valuable information for the prediction of breeding values and phenotypes. In this study, the utility of transcriptome data for phenotype prediction was tested with 185 inbred lines of Drosophila melanogaster for nine traits in two sexes. We incorporated the transcriptome data into genomic prediction via two methods: GTBLUP and GRBLUP, both combining single nucleotide polymorphisms (SNPs) and transcriptome data. The genotypic data was used to construct the common additive genomic relationship, which was used in genomic best linear unbiased prediction (GBLUP) or jointly in a linear mixed model with a transcriptome-based linear kernel (GTBLUP), or with a transcriptome-based Gaussian kernel (GRBLUP). We studied the predictive ability of the models and discuss a concept of "omics-augmented broad sense heritability" for the multi-omics era. For most traits, GRBLUP and GBLUP provided similar predictive abilities, but GRBLUP explained more of the phenotypic variance. There was only one trait (olfactory perception to Ethyl Butyrate in females) in which the predictive ability of GRBLUP (0.23) was significantly higher than the predictive ability of GBLUP (0.21). Our results suggest that accounting for transcriptome data has the potential to improve genomic predictions if transcriptome data can be included on a larger scale.
Collapse
Affiliation(s)
- Zhengcao Li
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| | - Ning Gao
- State Key Laboratory of Biocontrol, Guangzhou Higher Education Mega Center, School of Life Science, Sun Yat-sen University, Guangzhou, China
| | | | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| |
Collapse
|
49
|
Lyra DH, Galli G, Alves FC, Granato ÍSC, Vidotti MS, Bandeira E Sousa M, Morosini JS, Crossa J, Fritsche-Neto R. Modeling copy number variation in the genomic prediction of maize hybrids. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:273-288. [PMID: 30382311 DOI: 10.1007/s00122-018-3215-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Accepted: 10/20/2018] [Indexed: 06/08/2023]
Abstract
Our study indicates that copy variants may play an essential role in the phenotypic variation of complex traits in maize hybrids. Moreover, predicting hybrid phenotypes by combining additive-dominance effects with copy variants has the potential to be a viable predictive model. Non-additive effects resulting from the actions of multiple loci may influence trait variation in single-cross hybrids. In addition, complementation of allelic variation could be a valuable contributor to hybrid genetic variation, especially when crossing inbred lines with higher contents of copy gains. With this in mind, we aimed (1) to study the association between copy number variation (CNV) and hybrid phenotype, and (2) to compare the predictive ability (PA) of additive and additive-dominance genomic best linear unbiased prediction model when combined with the effects of CNV in two datasets of maize hybrids (USP and HELIX). In the USP dataset, we observed a significant negative phenotypic correlation of low magnitude between copy number loss and plant height, revealing a tendency that more copy losses lead to lower plants. In the same set, when CNV was combined with the additive plus dominance effects, the PA significantly increased only for plant height under low nitrogen. In this case, CNV effects explicitly capture relatedness between individuals and add extra information to the model. In the HELIX dataset, we observed a pronounced difference in PA between additive (0.50) and additive-dominance (0.71) models for predicting grain yield, suggesting a significant contribution of dominance. We conclude that copy variants may play an essential role in the phenotypic variation of complex traits in maize hybrids, although the inclusion of CNVs into datasets does not return significant gains concerning PA.
Collapse
Affiliation(s)
- Danilo Hottis Lyra
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil.
- Department of Computational and Analytical Sciences, Rothamsted Research, West Common, Harpenden, AL52JQ, UK.
| | - Giovanni Galli
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Filipe Couto Alves
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Ítalo Stefanine Correia Granato
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Miriam Suzane Vidotti
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Massaine Bandeira E Sousa
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - Júlia Silva Morosini
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| | - José Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), 06600, Texcoco, D.F, Mexico
| | - Roberto Fritsche-Neto
- Department of Genetics, Luiz de Queiroz College of Agriculture, University of São Paulo (ESALQ/USP), Piracicaba, São Paulo, Brazil
| |
Collapse
|
50
|
Rincent R, Charpentier JP, Faivre-Rampant P, Paux E, Le Gouis J, Bastien C, Segura V. Phenomic Selection Is a Low-Cost and High-Throughput Method Based on Indirect Predictions: Proof of Concept on Wheat and Poplar. G3 (BETHESDA, MD.) 2018; 8:3961-3972. [PMID: 30373914 PMCID: PMC6288839 DOI: 10.1534/g3.118.200760] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 10/20/2018] [Indexed: 11/29/2022]
Abstract
Genomic selection - the prediction of breeding values using DNA polymorphisms - is a disruptive method that has widely been adopted by animal and plant breeders to increase productivity. It was recently shown that other sources of molecular variations such as those resulting from transcripts or metabolites could be used to accurately predict complex traits. These endophenotypes have the advantage of capturing the expressed genotypes and consequently the complex regulatory networks that occur in the different layers between the genome and the phenotype. However, obtaining such omics data at very large scales, such as those typically experienced in breeding, remains challenging. As an alternative, we proposed using near-infrared spectroscopy (NIRS) as a high-throughput, low cost and non-destructive tool to indirectly capture endophenotypic variants and compute relationship matrices for predicting complex traits, and coined this new approach "phenomic selection" (PS). We tested PS on two species of economic interest (Triticum aestivum L. and Populus nigra L.) using NIRS on various tissues (grains, leaves, wood). We showed that one could reach predictions as accurate as with molecular markers, for developmental, tolerance and productivity traits, even in environments radically different from the one in which NIRS were collected. Our work constitutes a proof of concept and provides new perspectives for the breeding community, as PS is theoretically applicable to any organism at low cost and does not require any molecular information.
Collapse
Affiliation(s)
| | - Jean-Paul Charpentier
- BioForA, INRA, ONF, 45075 Orléans, France
- GenoBois analytical platform, INRA, 45075 Orléans, France
| | | | | | | | | | | |
Collapse
|