1
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence. G3 (BETHESDA, MD.) 2022; 12:jkac258. [PMID: 36161485 PMCID: PMC9635642 DOI: 10.1093/g3journal/jkac258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
Recent developments allowed generating multiple high-quality 'omics' data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Corresponding author: Animal Breeding and Genomics, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands.
| |
Collapse
|
2
|
Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics 2022; 23:476. [PMID: 35764918 PMCID: PMC9238188 DOI: 10.1186/s12864-022-08690-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Multi-omics represent a promising link between phenotypes and genome variation. Few studies yet address their integration to understand genetic architecture and improve predictability. Results Our study used 241 poplar genotypes, phenotyped in two common gardens, with xylem and cambium RNA sequenced at one site, yielding large phenotypic, genomic (SNP), and transcriptomic datasets. Prediction models for each trait were built separately for SNPs and transcripts, and compared to a third model integrated by concatenation of both omics. The advantage of integration varied across traits and, to understand such differences, an eQTL analysis was performed to characterize the interplay between the genome and transcriptome and classify the predicting features into cis or trans relationships. A strong, significant negative correlation was found between the change in predictability and the change in predictor ranking for trans eQTLs for traits evaluated in the site of transcriptomic sampling. Conclusions Consequently, beneficial integration happens when the redundancy of predictors is decreased, likely leaving the stage to other less prominent but complementary predictors. An additional gene ontology (GO) enrichment analysis appeared to corroborate such statistical output. To our knowledge, this is a novel finding delineating a promising method to explore data integration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08690-7.
Collapse
|
3
|
Liu YH, Zhang M, Scheuring CF, Cilkiz M, Sze SH, Smith CW, Murray SC, Xu W, Zhang HB. Accurate prediction of complex traits for individuals and offspring from parents using a simple, rapid, and efficient method for gene-based breeding in cotton and maize. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2022; 316:111153. [PMID: 35151437 DOI: 10.1016/j.plantsci.2021.111153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 12/11/2021] [Indexed: 06/14/2023]
Abstract
Accurate, simple, rapid, and inexpensive prediction of complex traits controlled by numerous genes is paramount to enhanced plant breeding, animal breeding, and human medicine. Here we report a novel method that enables accurate, simple, and rapid prediction of complex traits of individuals or offspring from parents based on the number of favorable alleles (NFAs) of the genes controlling the objective traits. The NFAs of 226 cotton fiber length (GFL) genes and nine maize hybrid grain yield related (ZmF1GY) genes were directly used to predict cotton fiber lengths of individual plants and maize grain yields of F1 hybrids from parents, respectively, using prediction model-based methods as controls. The NFAs of the 226 GFL genes predicted cotton fiber lengths at an accuracy of 0.85, as the model methods and outperforming genomic prediction by 82 % - 170 %. The NFAs of the nine ZmF1GY genes predicted grain yields of maize hybrids from parents at an accuracy of 0.80, outperforming genomic prediction by 67 %. Moreover, the prediction accuracies of these traits were consistent across years, environments, and eco-agricultural systems. Importantly, the accurate prediction of these traits directly using the NFAs of the genes allows breeding to be performed in greenhouse, phytotron, or off-season, without the need of the model training and validation steps essential and costly for model-based genomic or genic prediction. Therefore, this new method dramatically outperforms the current model-based genomic methods used for phenotype prediction and streamlines the process of breeding, thus promising to substantially enhance current plant and animal breeding.
Collapse
Affiliation(s)
- Yun-Hua Liu
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Meiping Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Chantel F Scheuring
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Mustafa Cilkiz
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Sing-Hoi Sze
- Department of Computer Science and Engineering and Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX 77843, USA
| | - C Wayne Smith
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Wenwei Xu
- Texas A&M AgriLife Research, Lubbock, TX 79403, USA
| | - Hong-Bin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA.
| |
Collapse
|
4
|
A novel computational approach for predicting complex phenotypes in Drosophila (starvation-sensitive and sterile) by deriving their gene expression signatures from public data. PLoS One 2020; 15:e0240824. [PMID: 33104720 PMCID: PMC7588067 DOI: 10.1371/journal.pone.0240824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 10/05/2020] [Indexed: 11/19/2022] Open
Abstract
Many research teams perform numerous genetic, transcriptomic, proteomic and other types of omic experiments to understand molecular, cellular and physiological mechanisms of disease and health. Often (but not always), the results of these experiments are deposited in publicly available repository databases. These data records often include phenotypic characteristics following genetic and environmental perturbations, with the aim of discovering underlying molecular mechanisms leading to the phenotypic responses. A constrained set of phenotypic characteristics is usually recorded and these are mostly hypothesis driven of possible to record within financial or practical constraints. We present a novel proof-of-principal computational approach for combining publicly available gene-expression data from control/mutant animal experiments that exhibit a particular phenotype, and we use this approach to predict unobserved phenotypic characteristics in new experiments (data derived from EBI’s ArrayExpress and ExpressionAtlas respectively). We utilised available microarray gene-expression data for two phenotypes (starvation-sensitive and sterile) in Drosophila. The data were combined using a linear-mixed effects model with the inclusion of consecutive principal components to account for variability between experiments in conjunction with Gene Ontology enrichment analysis. We present how available data can be ranked in accordance to a phenotypic likelihood of exhibiting these two phenotypes using random forest. The results from our study show that it is possible to integrate seemingly different gene-expression microarray data and predict a potential phenotypic manifestation with a relatively high degree of confidence (>80% AUC). This provides thus far unexplored opportunities for inferring unknown and unbiased phenotypic characteristics from already performed experiments, in order to identify studies for future analyses. Molecular mechanisms associated with gene and environment perturbations are intrinsically linked and give rise to a variety of phenotypic manifestations. Therefore, unravelling the phenotypic spectrum can help to gain insights into disease mechanisms associated with gene and environmental perturbations. Our approach uses public data that are set to increase in volume, thus providing value for money.
Collapse
|
5
|
Liu YH, Xu Y, Zhang M, Cui Y, Sze SH, Smith CW, Xu S, Zhang HB. Accurate Prediction of a Quantitative Trait Using the Genes Controlling the Trait for Gene-Based Breeding in Cotton. FRONTIERS IN PLANT SCIENCE 2020; 11:583277. [PMID: 33281846 PMCID: PMC7690289 DOI: 10.3389/fpls.2020.583277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 10/15/2020] [Indexed: 05/03/2023]
Abstract
Accurate phenotype prediction of quantitative traits is paramount to enhanced plant research and breeding. Here, we report the accurate prediction of cotton fiber length, a typical quantitative trait, using 474 cotton (Gossypium ssp.) fiber length (GFL) genes and nine prediction models. When the SNPs/InDels contained in 226 of the GFL genes or the expressions of all 474 GFL genes was used for fiber length prediction, a prediction accuracy of r = 0.83 was obtained, approaching the maximally possible prediction accuracy of a quantitative trait. This has improved by 116%, the prediction accuracies of the fiber length thus far achieved for genomic selection using genome-wide random DNA markers. Moreover, analysis of the GFL genes identified 125 of the GFL genes that are key to accurate prediction of fiber length, with which a prediction accuracy similar to that of all 474 GFL genes was obtained. The fiber lengths of the plants predicted with expressions of the 125 key GFL genes were significantly correlated with those predicted with the SNPs/InDels of the above 226 SNP/InDel-containing GFL genes (r = 0.892, P = 0.000). The prediction accuracies of fiber length using both genic datasets were highly consistent across environments or generations. Finally, we found that a training population consisting of 100-120 plants was sufficient to train a model for accurate prediction of a quantitative trait using the genes controlling the trait. Therefore, the genes controlling a quantitative trait are capable of accurately predicting its phenotype, thereby dramatically improving the ability, accuracy, and efficiency of phenotype prediction and promoting gene-based breeding in cotton and other species.
Collapse
Affiliation(s)
- Yun-Hua Liu
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States
| | - Yang Xu
- Botany and Plant Sciences, University of California, Riverside, Riverside, CA, United States
| | - Meiping Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States
| | - Yanru Cui
- Botany and Plant Sciences, University of California, Riverside, Riverside, CA, United States
| | - Sing-Hoi Sze
- Department of Computer Science and Engineering and Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, United States
| | - C. Wayne Smith
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States
| | - Shizhong Xu
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States
- *Correspondence: Shizhong Xu,
| | - Hong-Bin Zhang
- Botany and Plant Sciences, University of California, Riverside, Riverside, CA, United States
- Hong-Bin Zhang,
| |
Collapse
|
6
|
Bernardet C, Tambutté E, Techer N, Tambutté S, Venn AA. Ion transporter gene expression is linked to the thermal sensitivity of calcification in the reef coral Stylophora pistillata. Sci Rep 2019; 9:18676. [PMID: 31822787 PMCID: PMC6904480 DOI: 10.1038/s41598-019-54814-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 10/21/2019] [Indexed: 12/22/2022] Open
Abstract
Coral calcification underpins biodiverse reef ecosystems, but the physiology underlying the thermal sensitivity of corals to changing seawater temperatures remains unclear. Furthermore, light is also a key factor in modulating calcification rates, but a mechanistic understanding of how light interacts with temperature to affect coral calcification is lacking. Here, we characterized the thermal performance curve (TPC) of calcification of the wide-spread, model coral species Stylophora pistillata, and used gene expression analysis to investigate the role of ion transport mechanisms in thermally-driven declines in day and nighttime calcification. Focusing on genes linked to transport of dissolved inorganic carbon (DIC), calcium and H+, our study reveals a high degree of coherence between physiological responses (e.g. calcification and respiration) with distinct gene expression patterns to the different temperatures in day and night conditions. At low temperatures, calcification and gene expression linked to DIC transport processes were downregulated, but showed little response to light. By contrast, at elevated temperature, light had a positive effect on calcification and stimulated a more functionally diverse gene expression response of ion transporters. Overall, our findings highlight the role of mechanisms linked to DIC, calcium and H+ transport in the thermal sensitivity of coral calcification and how this sensitivity is influenced by light.
Collapse
Affiliation(s)
- C Bernardet
- Centre Scientifique de Monaco, Marine Biology Department, 8 Quai Antoine 1er, Monaco, 98000, Monaco
- Sorbonne Université, Collège Doctoral, F-75005, Paris, France
| | - E Tambutté
- Centre Scientifique de Monaco, Marine Biology Department, 8 Quai Antoine 1er, Monaco, 98000, Monaco
| | | | - S Tambutté
- Centre Scientifique de Monaco, Marine Biology Department, 8 Quai Antoine 1er, Monaco, 98000, Monaco
| | - A A Venn
- Centre Scientifique de Monaco, Marine Biology Department, 8 Quai Antoine 1er, Monaco, 98000, Monaco.
| |
Collapse
|
7
|
Harel T, Peshes-Yaloz N, Bacharach E, Gat-Viks I. Predicting Phenotypic Diversity from Molecular and Genetic Data. Genetics 2019; 213:297-311. [PMID: 31352366 PMCID: PMC6727812 DOI: 10.1534/genetics.119.302463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/04/2019] [Indexed: 01/03/2023] Open
Abstract
Despite the importance of complex phenotypes, an in-depth understanding of the combined molecular and genetic effects on a phenotype has yet to be achieved. Here, we introduce InPhenotype, a novel computational approach for complex phenotype prediction, where gene-expression data and genotyping data are integrated to yield quantitative predictions of complex physiological traits. Unlike existing computational methods, InPhenotype makes it possible to model potential regulatory interactions between gene expression and genomic loci without compromising the continuous nature of the molecular data. We applied InPhenotype to synthetic data, exemplifying its utility for different data parameters, as well as its superiority compared to current methods in both prediction quality and the ability to detect regulatory interactions of genes and genomic loci. Finally, we show that InPhenotype can provide biological insights into both mouse and yeast datasets.
Collapse
Affiliation(s)
- Tom Harel
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| | - Naama Peshes-Yaloz
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| | - Eran Bacharach
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| | - Irit Gat-Viks
- School of Molecular Cell Biology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801 Israe
| |
Collapse
|
8
|
Duhoux A, Carrère S, Duhoux A, Délye C. Transcriptional markers enable identification of rye-grass (Lolium sp.) plants with non-target-site-based resistance to herbicides inhibiting acetolactate-synthase. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2017; 257:22-36. [PMID: 28224916 DOI: 10.1016/j.plantsci.2017.01.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 01/12/2017] [Accepted: 01/17/2017] [Indexed: 05/20/2023]
Abstract
Molecular detection of herbicide non-target-site-based resistance (NTSR) classically requires extensively validated NTSR genes. We assessed the feasibility of predicting NTSR phenotypes using expression data of NTSR transcriptional markers, i.e., transcripts which expression levels are statistically correlated to NTSR. Markers were sought by comparative RNA-Seq analysis of untreated NTSR or sensitive plants from four rye-grass populations followed by expression quantification in 299 individual plants with characterised sensitivity to two acetolactate-synthase-inhibiting herbicides. Multivariate analyses were implemented to predict NTSR using combined marker expression data. Nineteen markers (four cytochromes P450, four glutathione-S-transferases, three glycosyltransferases, two ABC transporters, two hydrolases, one aldolase, one peptidase, one transferase and one esterase) expressed significantly higher in NTSR plants were identified. Expression was highest in the most resistant plants. Some markers appeared co-regulated. Combined marker expression data enabled prediction of NTSR phenotypes in individual plants or of resistant plant frequencies in populations. Thus, NTSR detection based on transcriptional markers proved feasible. Accuracy can be improved by identifying additional markers, especially markers associated to NTSR regulation. Additionally, our data suggest that NTSR mechanisms emerged in different populations via redundant evolution, and that NTSR can evolve by selection for higher constitutive expression of whole herbicide-response pathways.
Collapse
Affiliation(s)
- Arnaud Duhoux
- INRA, Agroécologie, 17 rue Sully, F-21000, Dijon, France
| | | | - Alexis Duhoux
- INRA, Agroécologie, 17 rue Sully, F-21000, Dijon, France
| | | |
Collapse
|