1
|
Santos MA, Carromeu-Santos A, Quina AS, Antunes MA, Kristensen TN, Santos M, Matos M, Fragata I, Simões P. Experimental Evolution in a Warming World: The Omics Era. Mol Biol Evol 2024; 41:msae148. [PMID: 39034684 PMCID: PMC11331425 DOI: 10.1093/molbev/msae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 06/25/2024] [Accepted: 07/12/2024] [Indexed: 07/23/2024] Open
Abstract
A comprehensive understanding of the genetic mechanisms that shape species responses to thermal variation is essential for more accurate predictions of the impacts of climate change on biodiversity. Experimental evolution with high-throughput resequencing approaches (evolve and resequence) is a highly effective tool that has been increasingly employed to elucidate the genetic basis of adaptation. The number of thermal evolve and resequence studies is rising, yet there is a dearth of efforts to integrate this new wealth of knowledge. Here, we review this literature showing how these studies have contributed to increase our understanding on the genetic basis of thermal adaptation. We identify two major trends: highly polygenic basis of thermal adaptation and general lack of consistency in candidate targets of selection between studies. These findings indicate that the adaptive responses to specific environments are rather independent. A review of the literature reveals several gaps in the existing research. Firstly, there is a paucity of studies done with organisms of diverse taxa. Secondly, there is a need to apply more dynamic and ecologically relevant thermal environments. Thirdly, there is a lack of studies that integrate genomic changes with changes in life history and behavioral traits. Addressing these issues would allow a more in-depth understanding of the relationship between genotype and phenotype. We highlight key methodological aspects that can address some of the limitations and omissions identified. These include the need for greater standardization of methodologies and the utilization of new technologies focusing on the integration of genomic and phenotypic variation in the context of thermal adaptation.
Collapse
Affiliation(s)
- Marta A Santos
- CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE, Global Change and Sustainability Institute, Lisboa, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Ana Carromeu-Santos
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Ana S Quina
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Egas Moniz Center for Interdisciplinary Research (CiiEM), Egas Moniz School of Health & Science, Almada, Portugal
| | - Marta A Antunes
- CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE, Global Change and Sustainability Institute, Lisboa, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | | | - Mauro Santos
- CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE, Global Change and Sustainability Institute, Lisboa, Portugal
- Departament de Genètica i de Microbiologia, Grup de Genòmica, Bioinformàtica i Biologia Evolutiva (GBBE), Universitat Autonòma de Barcelona, Bellaterra, Spain
| | - Margarida Matos
- CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE, Global Change and Sustainability Institute, Lisboa, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Inês Fragata
- CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE, Global Change and Sustainability Institute, Lisboa, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Pedro Simões
- CE3C—Centre for Ecology, Evolution and Environmental Changes & CHANGE, Global Change and Sustainability Institute, Lisboa, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
2
|
Nascimento M, Nascimento ACC, Azevedo CF, de Oliveira ACB, Caixeta ET, Jarquin D. Enhancing genomic prediction with Stacking Ensemble Learning in Arabica Coffee. FRONTIERS IN PLANT SCIENCE 2024; 15:1373318. [PMID: 39086911 PMCID: PMC11288849 DOI: 10.3389/fpls.2024.1373318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/12/2024] [Indexed: 08/02/2024]
Abstract
Coffee Breeding programs have traditionally relied on observing plant characteristics over years, a slow and costly process. Genomic selection (GS) offers a DNA-based alternative for faster selection of superior cultivars. Stacking Ensemble Learning (SEL) combines multiple models for potentially even more accurate selection. This study explores SEL potential in coffee breeding, aiming to improve prediction accuracy for important traits [yield (YL), total number of the fruits (NF), leaf miner infestation (LM), and cercosporiosis incidence (Cer)] in Coffea Arabica. We analyzed data from 195 individuals genotyped for 21,211 single-nucleotide polymorphism (SNP) markers. To comprehensively assess model performance, we employed a cross-validation (CV) scheme. Genomic Best Linear Unbiased Prediction (GBLUP), multivariate adaptive regression splines (MARS), Quantile Random Forest (QRF), and Random Forest (RF) served as base learners. For the meta-learner within the SEL framework, various options were explored, including Ridge Regression, RF, GBLUP, and Single Average. The SEL method was able to predict the predictive ability (PA) of important traits in Coffea Arabica. SEL presented higher PA compared with those obtained for all base learner methods. The gains in PA in relation to GBLUP were 87.44% (the ratio between the PA obtained from best Stacking model and the GBLUP), 37.83%, 199.82%, and 14.59% for YL, NF, LM and Cer, respectively. Overall, SEL presents a promising approach for GS. By combining predictions from multiple models, SEL can potentially enhance the PA of GS for complex traits.
Collapse
Affiliation(s)
- Moyses Nascimento
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Ana Carolina Campana Nascimento
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Camila Ferreira Azevedo
- Laboratory of Intelligence Computational and Statistical Learning (LICAE), Department of Statistics, Federal University of Viçosa, Viçosa, Brazil
| | | | | | - Diego Jarquin
- Agronomy Department, University of Florida, Gainesville, FL, United States
| |
Collapse
|
3
|
Ali B, Huguenin-Bizot B, Laurent M, Chaumont F, Maistriaux LC, Nicolas S, Duborjal H, Welcker C, Tardieu F, Mary-Huard T, Moreau L, Charcosset A, Runcie D, Rincent R. High-dimensional multi-omics measured in controlled conditions are useful for maize platform and field trait predictions. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:175. [PMID: 38958724 DOI: 10.1007/s00122-024-04679-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/15/2024] [Indexed: 07/04/2024]
Abstract
KEY MESSAGE Transcriptomics and proteomics information collected on a platform can predict additive and non-additive effects for platform traits and additive effects for field traits. The effects of climate change in the form of drought, heat stress, and irregular seasonal changes threaten global crop production. The ability of multi-omics data, such as transcripts and proteins, to reflect a plant's response to such climatic factors can be capitalized in prediction models to maximize crop improvement. Implementing multi-omics characterization in field evaluations is challenging due to high costs. It is, however, possible to do it on reference genotypes in controlled conditions. Using omics measured on a platform, we tested different multi-omics-based prediction approaches, using a high dimensional linear mixed model (MegaLMM) to predict genotypes for platform traits and agronomic field traits in a panel of 244 maize hybrids. We considered two prediction scenarios: in the first one, new hybrids are predicted (CV-NH), and in the second one, partially observed hybrids are predicted (CV-POH). For both scenarios, all hybrids were characterized for omics on the platform. We observed that omics can predict both additive and non-additive genetic effects for the platform traits, resulting in much higher predictive abilities than GBLUP. It highlights their efficiency in capturing regulatory processes in relation to growth conditions. For the field traits, we observed that the additive components of omics only slightly improved predictive abilities for predicting new hybrids (CV-NH, model MegaGAO) and for predicting partially observed hybrids (CV-POH, model GAOxW-BLUP) in comparison to GBLUP. We conclude that measuring the omics in the fields would be of considerable interest in predicting productivity if the costs of omics drop significantly.
Collapse
Affiliation(s)
- Baber Ali
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Bertrand Huguenin-Bizot
- Laboratoire Reproduction Et Développement Des Plantes, CNRS, ENS de Lyon-46, Allée d'Italie, 69364, Lyon, France
| | - Maxime Laurent
- Louvain Institute of Biomolecular Science and Technology, UCLouvain, Louvain-La-Neuve, Belgium
| | - François Chaumont
- Louvain Institute of Biomolecular Science and Technology, UCLouvain, Louvain-La-Neuve, Belgium
| | - Laurie C Maistriaux
- Louvain Institute of Biomolecular Science and Technology, UCLouvain, Louvain-La-Neuve, Belgium
| | - Stéphane Nicolas
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Hervé Duborjal
- Limagrain, Limagrain Fields Seeds, Research Centre, 63720, Chappes, France
| | | | | | - Tristan Mary-Huard
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Laurence Moreau
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Alain Charcosset
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Renaud Rincent
- INRAE, CNRS, AgroParisTech, GQE-Le Moulon, Université Paris-Saclay, 91190, Gif-Sur-Yvette, France.
| |
Collapse
|
4
|
Zhang Y, Zhuang Z, Liu Y, Huang J, Luan M, Zhao X, Dong L, Ye J, Yang M, Zheng E, Cai G, Wu Z, Yang J. Genomic prediction based on preselected single-nucleotide polymorphisms from genome-wide association study and imputed whole-genome sequence data annotation for growth traits in Duroc pigs. Evol Appl 2024; 17:e13651. [PMID: 38362509 PMCID: PMC10868536 DOI: 10.1111/eva.13651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 10/31/2023] [Accepted: 01/13/2024] [Indexed: 02/17/2024] Open
Abstract
The use of whole-genome sequence (WGS) data is expected to improve genomic prediction (GP) power of complex traits because it may contain mutations that in strong linkage disequilibrium pattern with causal mutations. However, a few previous studies have shown no or small improvement in prediction accuracy using WGS data. Incorporating prior biological information into GP seems to be an attractive strategy that might improve prediction accuracy. In this study, a total of 6334 pigs were genotyped using 50K chips and subsequently imputed to the WGS level. This cohort includes two prior discovery populations that comprise 294 Landrace pigs and 186 Duroc pigs, as well as two validation populations that consist of 3770 American Duroc pigs and 2084 Canadian Duroc pigs. Then we used annotation information and genome-wide association study (GWAS) from the WGS data to make GP for six growth traits in two Duroc pig populations. Based on variant annotation, we partitioned different genomic classes, such as intron, intergenic, and untranslated regions, for imputed WGS data. Based on GWAS results of WGS data, we obtained trait-associated single-nucleotide polymorphisms (SNPs). We then applied the genomic feature best linear unbiased prediction (GFBLUP) and genomic best linear unbiased prediction (GBLUP) models to estimate the genomic estimated breeding values for growth traits with these different variant panels, including six genomic classes and trait-associated SNPs. Compared with 50K chip data, GBLUP with imputed WGS data had no increase in prediction accuracy. Using only annotations resulted in no increase in prediction accuracy compared to GBLUP with 50K, but adding annotation information into the GFBLUP model with imputed WGS data could improve the prediction accuracy with increases of 0.00%-2.82%. In conclusion, a GFBLUP model that incorporated prior biological information might increase the advantage of using imputed WGS data for GP.
Collapse
Affiliation(s)
- Yuling Zhang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Zhanwei Zhuang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Yiyi Liu
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Jinyan Huang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Menghao Luan
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Xiang Zhao
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Linsong Dong
- Guangdong Zhongxin Breeding Technology Co., LtdGuangzhouChina
| | - Jian Ye
- Guangdong Zhongxin Breeding Technology Co., LtdGuangzhouChina
| | - Ming Yang
- College of Animal Science and TechnologyZhongkai University of Agriculture and EngineeringGuangzhouChina
| | - Enqin Zheng
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Gengyuan Cai
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| | - Zhenfang Wu
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
- Guangdong Zhongxin Breeding Technology Co., LtdGuangzhouChina
| | - Jie Yang
- College of Animal Science and National Engineering Research Center for Breeding Swine IndustrySouth China Agricultural UniversityGuangzhouChina
- Guangdong Provincial Key Laboratory of Agro‐animal Genomics and Molecular BreedingSouth China Agricultural UniversityGuangzhouChina
| |
Collapse
|
5
|
Onogi A. A Bayesian model for genomic prediction using metabolic networks. BIOINFORMATICS ADVANCES 2023; 3:vbad106. [PMID: 39131740 PMCID: PMC11312854 DOI: 10.1093/bioadv/vbad106] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/26/2023] [Accepted: 08/10/2023] [Indexed: 08/13/2024]
Abstract
Motivation Genomic prediction is now an essential technique in breeding and medicine, and it is interesting to see how omics data can be used to improve prediction accuracy. Precedent work proposed a metabolic network-based method in biomass prediction of Arabidopsis; however, the method consists of multiple steps that possibly degrade prediction accuracy. Results We proposed a Bayesian model that integrates all steps and jointly infers all fluxes of reactions related to biomass production. The proposed model showed higher accuracies than methods compared both in simulated and real data. The findings support the previous excellent idea that metabolic network information can be used for prediction. Availability and implementation All R and stan scripts to reproduce the results of this study are available at https://github.com/Onogi/MetabolicModeling.
Collapse
Affiliation(s)
- Akio Onogi
- Department of Life Sciences, Faculty of Agriculture, Ryukoku
University, Otsu, Shiga 520-2194, Japan
| |
Collapse
|
6
|
Zhao W, Qadri QR, Zhang Z, Wang Z, Pan Y, Wang Q, Zhang Z. PyAGH: a python package to fast construct kinship matrices based on different levels of omic data. BMC Bioinformatics 2023; 24:153. [PMID: 37072709 PMCID: PMC10111838 DOI: 10.1186/s12859-023-05280-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 04/10/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND Construction of kinship matrices among individuals is an important step for both association studies and prediction studies based on different levels of omic data. Methods for constructing kinship matrices are becoming diverse and different methods have their specific appropriate scenes. However, software that can comprehensively calculate kinship matrices for a variety of scenarios is still in an urgent demand. RESULTS In this study, we developed an efficient and user-friendly python module, PyAGH, that can accomplish (1) conventional additive kinship matrces construction based on pedigree, genotypes, abundance data from transcriptome or microbiome; (2) genomic kinship matrices construction in combined population; (3) dominant and epistatic effects kinship matrices construction; (4) pedigree selection, tracing, detection and visualization; (5) visualization of cluster, heatmap and PCA analysis based on kinship matrices. The output from PyAGH can be easily integrated in other mainstream software based on users' purposes. Compared with other softwares, PyAGH integrates multiple methods for calculating the kinship matrix and has advantages in terms of speed and data size compared to other software. PyAGH is developed in python and C + + and can be easily installed by pip tool. Installation instructions and a manual document can be freely available from https://github.com/zhaow-01/PyAGH . CONCLUSION PyAGH is a fast and user-friendly Python package for calculating kinship matrices using pedigree, genotype, microbiome and transcriptome data as well as processing, analyzing and visualizing data and results. This package makes it easier to perform predictions and association studies processes based on different levels of omic data.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, 800# Dongchuan Road, Shanghai, China
| | - Qamar Raza Qadri
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, 800# Dongchuan Road, Shanghai, China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
- Hainan Research Institute, Zhejiang University, 11# Yonyou Industrial Park, Yazhou Bay Science and Technology City, Sanya, 572025, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China.
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China.
| |
Collapse
|
7
|
Sun G, Yu H, Wang P, Lopez-Guerrero M, Mural RV, Mizero ON, Grzybowski M, Song B, van Dijk K, Schachtman DP, Zhang C, Schnable JC. A role for heritable transcriptomic variation in maize adaptation to temperate environments. Genome Biol 2023; 24:55. [PMID: 36964601 PMCID: PMC10037803 DOI: 10.1186/s13059-023-02891-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/06/2023] [Indexed: 03/26/2023] Open
Abstract
Background Transcription bridges genetic information and phenotypes. Here, we evaluated how changes in transcriptional regulation enable maize (Zea mays), a crop originally domesticated in the tropics, to adapt to temperate environments. Result We generated 572 unique RNA-seq datasets from the roots of 340 maize genotypes. Genes involved in core processes such as cell division, chromosome organization and cytoskeleton organization showed lower heritability of gene expression, while genes involved in anti-oxidation activity exhibited higher expression heritability. An expression genome-wide association study (eGWAS) identified 19,602 expression quantitative trait loci (eQTLs) associated with the expression of 11,444 genes. A GWAS for alternative splicing identified 49,897 splicing QTLs (sQTLs) for 7614 genes. Genes harboring both cis-eQTLs and cis-sQTLs in linkage disequilibrium were disproportionately likely to encode transcription factors or were annotated as responding to one or more stresses. Independent component analysis of gene expression data identified loci regulating co-expression modules involved in oxidation reduction, response to water deprivation, plastid biogenesis, protein biogenesis, and plant-pathogen interaction. Several genes involved in cell proliferation, flower development, DNA replication, and gene silencing showed lower gene expression variation explained by genetic factors between temperate and tropical maize lines. A GWAS of 27 previously published phenotypes identified several candidate genes overlapping with genomic intervals showing signatures of selection during adaptation to temperate environments. Conclusion Our results illustrate how maize transcriptional regulatory networks enable changes in transcriptional regulation to adapt to temperate regions. Supplementary information The online version contains supplementary material available at 10.1186/s13059-023-02891-3.
Collapse
Affiliation(s)
- Guangchao Sun
- grid.24434.350000 0004 1937 0060Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| | - Huihui Yu
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, USA
| | - Peng Wang
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| | - Martha Lopez-Guerrero
- grid.24434.350000 0004 1937 0060Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, USA
| | - Ravi V. Mural
- grid.24434.350000 0004 1937 0060Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| | - Olivier N. Mizero
- grid.24434.350000 0004 1937 0060Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| | - Marcin Grzybowski
- grid.24434.350000 0004 1937 0060Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| | - Baoxing Song
- grid.5386.8000000041936877XInstitute for Genomic Diversity, Cornell University, Ithaca, USA
| | - Karin van Dijk
- grid.24434.350000 0004 1937 0060Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, USA
| | - Daniel P. Schachtman
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| | - Chi Zhang
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, USA
| | - James C. Schnable
- grid.24434.350000 0004 1937 0060Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, USA
- grid.24434.350000 0004 1937 0060Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, USA
| |
Collapse
|
8
|
Zhang R, Zhang Y, Liu T, Jiang B, Li Z, Qu Y, Chen Y, Li Z. Utilizing Variants Identified with Multiple Genome-Wide Association Study Methods Optimizes Genomic Selection for Growth Traits in Pigs. Animals (Basel) 2023; 13:ani13040722. [PMID: 36830509 PMCID: PMC9952664 DOI: 10.3390/ani13040722] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 02/09/2023] [Accepted: 02/15/2023] [Indexed: 02/22/2023] Open
Abstract
Improving the prediction accuracies of economically important traits in genomic selection (GS) is a main objective for researchers and breeders in the livestock industry. This study aims at utilizing potentially functional SNPs and QTLs identified with various genome-wide association study (GWAS) models in GS of pig growth traits. We used three well-established GWAS methods, including the mixed linear model, Bayesian model and meta-analysis, as well as 60K SNP-chip and whole genome sequence (WGS) data from 1734 Yorkshire and 1123 Landrace pigs to detect SNPs related to four growth traits: average daily gain, backfat thickness, body weight and birth weight. A total of 1485 significant loci and 24 candidate genes which are involved in skeletal muscle development, fatty deposition, lipid metabolism and insulin resistance were identified. Compared with using all SNP-chip data, GS with the pre-selected functional SNPs in the standard genomic best linear unbiased prediction (GBLUP), and a two-kernel based GBLUP model yielded average gains in accuracy by 4 to 46% (from 0.19 ± 0.07 to 0.56 ± 0.07) and 5 to 27% (from 0.16 ± 0.06 to 0.57 ± 0.05) for the four traits, respectively, suggesting that the prioritization of preselected functional markers in GS models had the potential to improve prediction accuracies for certain traits in livestock breeding.
Collapse
Affiliation(s)
- Ruifeng Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Yi Zhang
- Institute of Neuroscience, Panzhihua University, Panzhihua 617000, China
| | - Tongni Liu
- Genetic Data Center, Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Bo Jiang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhenyang Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Youping Qu
- Guangdong IPIG Technology Co., Ltd., Guangzhou 510006, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhengcao Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou 510006, China
- Correspondence:
| |
Collapse
|
9
|
Hu X, Carver BF, El-Kassaby YA, Zhu L, Chen C. Weighted kernels improve multi-environment genomic prediction. Heredity (Edinb) 2023; 130:82-91. [PMID: 36522412 PMCID: PMC9905581 DOI: 10.1038/s41437-022-00582-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 11/27/2022] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Crucial to variety improvement programs is the reliable and accurate prediction of genotype's performance across environments. However, due to the impactful presence of genotype by environment (G×E) interaction that dictates how changes in expression and function of genes influence target traits in different environments, prediction performance of genomic selection (GS) using single-environment models often falls short. Furthermore, despite the successes of genome-wide association studies (GWAS), the genetic insights derived from genome-to-phenome mapping have not yet been incorporated in predictive analytics, making GS models that use Gaussian kernel primarily an estimator of genomic similarity, instead of the underlying genetics characteristics of the populations. Here, we developed a GS framework that, in addition to capturing the overall genomic relationship, can capitalize on the signal of genetic associations of the phenotypic variation as well as the genetic characteristics of the populations. The capacity of predicting the performance of populations across environments was demonstrated by an overall gain in predictability up to 31% for the winter wheat DH population. Compared to Gaussian kernels, we showed that our multi-environment weighted kernels could better leverage the significance of genetic associations and yielded a marked improvement of 4-33% in prediction accuracy for half-sib families. Furthermore, the flexibility incorporated in our Bayesian implementation provides the generalizable capacity required for predicting multiple highly genetic heterogeneous populations across environments, allowing reliable GS for genetic improvement programs that have no access to genetically uniform material.
Collapse
Affiliation(s)
- Xiaowei Hu
- grid.65519.3e0000 0001 0721 7331Department of Statistics, Oklahoma State University, Stillwater, OK USA ,grid.27755.320000 0000 9136 933XPresent Address: Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Brett F. Carver
- grid.65519.3e0000 0001 0721 7331Department of Plant and Soil Sciences, Oklahoma State University, Stillwater, OK USA
| | - Yousry A. El-Kassaby
- grid.17091.3e0000 0001 2288 9830Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC Canada
| | - Lan Zhu
- grid.65519.3e0000 0001 0721 7331Department of Statistics, Oklahoma State University, Stillwater, OK USA
| | - Charles Chen
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, USA.
| |
Collapse
|
10
|
Hawkins NT, Maldaver M, Yannakopoulos A, Guare LA, Krishnan A. Systematic tissue annotations of genomics samples by modeling unstructured metadata. Nat Commun 2022; 13:6736. [PMID: 36347858 PMCID: PMC9643451 DOI: 10.1038/s41467-022-34435-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/25/2022] [Indexed: 11/10/2022] Open
Abstract
There are currently >1.3 million human -omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at https://github.com/krishnanlab/txt2onto .
Collapse
Affiliation(s)
- Nathaniel T Hawkins
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Marc Maldaver
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Anna Yannakopoulos
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Lindsay A Guare
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
| |
Collapse
|
11
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence. G3 (BETHESDA, MD.) 2022; 12:jkac258. [PMID: 36161485 PMCID: PMC9635642 DOI: 10.1093/g3journal/jkac258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
Recent developments allowed generating multiple high-quality 'omics' data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Corresponding author: Animal Breeding and Genomics, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands.
| |
Collapse
|
12
|
Liang M, An B, Chang T, Deng T, Du L, Li K, Cao S, Du Y, Xu L, Zhang L, Gao X, Li J, Gao H. Incorporating kernelized multi-omics data improves the accuracy of genomic prediction. J Anim Sci Biotechnol 2022; 13:103. [PMID: 36127743 PMCID: PMC9490992 DOI: 10.1186/s40104-022-00756-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/08/2022] [Indexed: 11/18/2022] Open
Abstract
Background Genomic selection (GS) has revolutionized animal and plant breeding after the first implementation via early selection before measuring phenotypes. Besides genome, transcriptome and metabolome information are increasingly considered new sources for GS. Difficulties in building the model with multi-omics data for GS and the limit of specimen availability have both delayed the progress of investigating multi-omics. Results We utilized the Cosine kernel to map genomic and transcriptomic data as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${n}\times {n}$$\end{document}n×n symmetric matrix (G matrix and T matrix), combined with the best linear unbiased prediction (BLUP) for GS. Here, we defined five kernel-based prediction models: genomic BLUP (GBLUP), transcriptome-BLUP (TBLUP), multi-omics BLUP (MBLUP, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\boldsymbol M=\mathrm{ratio}\times\boldsymbol G+(1-\mathrm{ratio})\times\boldsymbol T$$\end{document}M=ratio×G+(1-ratio)×T), multi-omics single-step BLUP (mssBLUP), and weighted multi-omics single-step BLUP (wmssBLUP) to integrate transcribed individuals and genotyped resource population. The predictive accuracy evaluations in four traits of the Chinese Simmental beef cattle population showed that (1) MBLUP was far preferred to GBLUP (ratio = 1.0), (2) the prediction accuracy of wmssBLUP and mssBLUP had 4.18% and 3.37% average improvement over GBLUP, (3) We also found the accuracy of wmssBLUP increased with the growing proportion of transcribed cattle in the whole resource population. Conclusions We concluded that the inclusion of transcriptome data in GS had the potential to improve accuracy. Moreover, wmssBLUP is accepted to be a promising alternative for the present situation in which plenty of individuals are genotyped when fewer are transcribed. Supplementary Information The online version contains supplementary material available at 10.1186/s40104-022-00756-6.
Collapse
Affiliation(s)
- Mang Liang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Bingxing An
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Tianpeng Chang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Tianyu Deng
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lili Du
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Keanning Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Sheng Cao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Yueying Du
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China.
| |
Collapse
|
13
|
Mollandin F, Gilbert H, Croiseau P, Rau A. Accounting for overlapping annotations in genomic prediction models of complex traits. BMC Bioinformatics 2022; 23:365. [PMID: 36068513 PMCID: PMC9446854 DOI: 10.1186/s12859-022-04914-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background It is now widespread in livestock and plant breeding to use genotyping data to predict phenotypes with genomic prediction models. In parallel, genomic annotations related to a variety of traits are increasing in number and granularity, providing valuable insight into potentially important positions in the genome. The BayesRC model integrates this prior biological information by factorizing the genome according to disjoint annotation categories, in some cases enabling improved prediction of heritable traits. However, BayesRC is not adapted to cases where markers may have multiple annotations. Results We propose two novel Bayesian approaches to account for multi-annotated markers through a cumulative (BayesRC+) or preferential (BayesRC\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π) model of the contribution of multiple annotation categories. We illustrate their performance on simulated data with various genetic architectures and types of annotations. We also explore their use on data from a backcross population of growing pigs in conjunction with annotations constructed using the PigQTLdb. In both simulated and real data, we observed a modest improvement in prediction quality with our models when used with informative annotations. In addition, our results show that BayesRC+ successfully prioritizes multi-annotated markers according to their posterior variance, while BayesRC\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π provides a useful interpretation of informative annotations for multi-annotated markers. Finally, we explore several strategies for constructing annotations from a public database, highlighting the importance of careful consideration of this step. Conclusion When used with annotations that are relevant to the trait under study, BayesRC\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\pi$$\end{document}π and BayesRC+ allow for improved prediction and prioritization of multi-annotated markers, and can provide useful biological insight into the genetic architecture of traits. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04914-5.
Collapse
Affiliation(s)
- Fanny Mollandin
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France.
| | - Hélène Gilbert
- GenPhySE, INRAE, ENVT, Université de Toulouse, 31320, Castanet Tolosan, France
| | - Pascal Croiseau
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France
| | - Andrea Rau
- INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France.,BioEcoAgro Joint Research Unit, INRAE, Université de Liège, Université de Lille, Université de Picardie Jules Verne, 50136, Estrée-Mons, France
| |
Collapse
|
14
|
Hansen PB, Ruud AK, de los Campos G, Malinowska M, Nagy I, Svane SF, Thorup-Kristensen K, Jensen JD, Krusell L, Asp T. Integration of DNA Methylation and Transcriptome Data Improves Complex Trait Prediction in Hordeum vulgare. PLANTS 2022; 11:plants11172190. [PMID: 36079572 PMCID: PMC9459846 DOI: 10.3390/plants11172190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/19/2022] [Accepted: 08/21/2022] [Indexed: 11/30/2022]
Abstract
Whole-genome multi-omics profiles contain valuable information for the characterization and prediction of complex traits in plants. In this study, we evaluate multi-omics models to predict four complex traits in barley (Hordeum vulgare); grain yield, thousand kernel weight, protein content, and nitrogen uptake. Genomic, transcriptomic, and DNA methylation data were obtained from 75 spring barley lines tested in the RadiMax semi-field phenomics facility under control and water-scarce treatment. By integrating multi-omics data at genomic, transcriptomic, and DNA methylation regulatory levels, a higher proportion of phenotypic variance was explained (0.72–0.91) than with genomic models alone (0.55–0.86). The correlation between predictions and phenotypes varied from 0.17–0.28 for control plants and 0.23–0.37 for water-scarce plants, and the increase in accuracy was significant for nitrogen uptake and protein content compared to models using genomic information alone. Adding transcriptomic and DNA methylation information to the prediction models explained more of the phenotypic variance attributed to the environment in grain yield and nitrogen uptake. It furthermore explained more of the non-additive genetic effects for thousand kernel weight and protein content. Our results show the feasibility of multi-omics prediction for complex traits in barley.
Collapse
Affiliation(s)
- Pernille Bjarup Hansen
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
- Correspondence: (P.B.H.); (T.A.); Tel.: +45-87158243 (T.A.)
| | - Anja Karine Ruud
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Marta Malinowska
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Istvan Nagy
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
| | - Simon Fiil Svane
- Section for Crop Sciences, Department of Plant and Environmental Sciences, Copenhagen University, 2630 Taastrup, Denmark
| | - Kristian Thorup-Kristensen
- Section for Crop Sciences, Department of Plant and Environmental Sciences, Copenhagen University, 2630 Taastrup, Denmark
| | | | - Lene Krusell
- Sejet Plant Breeding, Nørremarksvej 67, 8700 Horsens, Denmark
| | - Torben Asp
- Center for Quantitative Genetics and Genomics, Aarhus University, 4200 Slagelse, Denmark
- Correspondence: (P.B.H.); (T.A.); Tel.: +45-87158243 (T.A.)
| |
Collapse
|
15
|
Wade AR, Duruflé H, Sanchez L, Segura V. eQTLs are key players in the integration of genomic and transcriptomic data for phenotype prediction. BMC Genomics 2022; 23:476. [PMID: 35764918 PMCID: PMC9238188 DOI: 10.1186/s12864-022-08690-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Multi-omics represent a promising link between phenotypes and genome variation. Few studies yet address their integration to understand genetic architecture and improve predictability. Results Our study used 241 poplar genotypes, phenotyped in two common gardens, with xylem and cambium RNA sequenced at one site, yielding large phenotypic, genomic (SNP), and transcriptomic datasets. Prediction models for each trait were built separately for SNPs and transcripts, and compared to a third model integrated by concatenation of both omics. The advantage of integration varied across traits and, to understand such differences, an eQTL analysis was performed to characterize the interplay between the genome and transcriptome and classify the predicting features into cis or trans relationships. A strong, significant negative correlation was found between the change in predictability and the change in predictor ranking for trans eQTLs for traits evaluated in the site of transcriptomic sampling. Conclusions Consequently, beneficial integration happens when the redundancy of predictors is decreased, likely leaving the stage to other less prominent but complementary predictors. An additional gene ontology (GO) enrichment analysis appeared to corroborate such statistical output. To our knowledge, this is a novel finding delineating a promising method to explore data integration. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08690-7.
Collapse
|
16
|
Mathew B, Hauptmann A, Léon J, Sillanpää MJ. NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:800161. [PMID: 35574107 PMCID: PMC9100816 DOI: 10.3389/fpls.2022.800161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 03/18/2022] [Indexed: 06/15/2023]
Abstract
Prediction of complex traits based on genome-wide marker information is of central importance for both animal and plant breeding. Numerous models have been proposed for the prediction of complex traits and still considerable effort has been given to improve the prediction accuracy of these models, because various genetics factors like additive, dominance and epistasis effects can influence of the prediction accuracy of such models. Recently machine learning (ML) methods have been widely applied for prediction in both animal and plant breeding programs. In this study, we propose a new algorithm for genomic prediction which is based on neural networks, but incorporates classical elements of LASSO. Our new method is able to account for the local epistasis (higher order interaction between the neighboring markers) in the prediction. We compare the prediction accuracy of our new method with the most commonly used prediction methods, such as BayesA, BayesB, Bayesian Lasso (BL), genomic BLUP and Elastic Net (EN) using the heterogenous stock mouse and rice field data sets.
Collapse
Affiliation(s)
- Boby Mathew
- Bayer CropScience, Monheim am Rhein, Germany
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Andreas Hauptmann
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- Department of Computer Science, University College London, London, United Kingdom
| | - Jens Léon
- Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| |
Collapse
|
17
|
Wu PY, Stich B, Weisweiler M, Shrestha A, Erban A, Westhoff P, Inghelandt DV. Improvement of prediction ability by integrating multi-omic datasets in barley. BMC Genomics 2022; 23:200. [PMID: 35279073 PMCID: PMC8917753 DOI: 10.1186/s12864-022-08337-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Genomic prediction (GP) based on single nucleotide polymorphisms (SNP) has become a broadly used tool to increase the gain of selection in plant breeding. However, using predictors that are biologically closer to the phenotypes such as transcriptome and metabolome may increase the prediction ability in GP. The objectives of this study were to (i) assess the prediction ability for three yield-related phenotypic traits using different omic datasets as single predictors compared to a SNP array, where these omic datasets included different types of sequence variants (full-SV, deleterious-dSV, and tolerant-tSV), different types of transcriptome (expression presence/absence variation-ePAV, gene expression-GE, and transcript expression-TE) sampled from two tissues, leaf and seedling, and metabolites (M); (ii) investigate the improvement in prediction ability when combining multiple omic datasets information to predict phenotypic variation in barley breeding programs; (iii) explore the predictive performance when using SV, GE, and ePAV from simulated 3’end mRNA sequencing of different lengths as predictors. Results The prediction ability from genomic best linear unbiased prediction (GBLUP) for the three traits using dSV information was higher than when using tSV, all SV information, or the SNP array. Any predictors from the transcriptome (GE, TE, as well as ePAV) and metabolome provided higher prediction abilities compared to the SNP array and SV on average across the three traits. In addition, some (di)-similarity existed between different omic datasets, and therefore provided complementary biological perspectives to phenotypic variation. Optimal combining the information of dSV, TE, ePAV, as well as metabolites into GP models could improve the prediction ability over that of the single predictors alone. Conclusions The use of integrated omic datasets in GP model is highly recommended. Furthermore, we evaluated a cost-effective approach generating 3’end mRNA sequencing with transcriptome data extracted from seedling without losing prediction ability in comparison to the full-length mRNA sequencing, paving the path for the use of such prediction methods in commercial breeding programs. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08337-7).
Collapse
|
18
|
Zhao T, Zeng J, Cheng H. Extend mixed models to multilayer neural networks for genomic prediction including intermediate omics data. Genetics 2022; 221:6536967. [PMID: 35212766 PMCID: PMC9071534 DOI: 10.1093/genetics/iyac034] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 02/17/2022] [Indexed: 11/13/2022] Open
Abstract
With the growing amount and diversity of intermediate omics data complementary to genomics (e.g. DNA methylation, gene expression, and protein abundance), there is a need to develop methods to incorporate intermediate omics data into conventional genomic evaluation. The omics data help decode the multiple layers of regulation from genotypes to phenotypes, thus forms a connected multilayer network naturally. We developed a new method named NN-MM to model the multiple layers of regulation from genotypes to intermediate omics features, then to phenotypes, by extending conventional linear mixed models ("MM") to multilayer artificial neural networks ("NN"). NN-MM incorporates intermediate omics features by adding middle layers between genotypes and phenotypes. Linear mixed models (e.g. pedigree-based BLUP, GBLUP, Bayesian Alphabet, single-step GBLUP, or single-step Bayesian Alphabet) can be used to sample marker effects or genetic values on intermediate omics features, and activation functions in neural networks are used to capture the nonlinear relationships between intermediate omics features and phenotypes. NN-MM had significantly better prediction performance than the recently proposed single-step approach for genomic prediction with intermediate omics data. Compared to the single-step approach, NN-MM can handle various patterns of missing omics measures and allows nonlinear relationships between intermediate omics features and phenotypes. NN-MM has been implemented in an open-source package called "JWAS".
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA,Integrative Genetics and Genomics Graduate Group, University of California Davis, Davis, CA 95616, USA
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA,Corresponding author: Department of Animal Science, University of California, Davis, CA 95616, USA.
| |
Collapse
|
19
|
Nantongo JS, Potts BM, Frickey T, Telfer E, Dungey H, Fitzgerald H, O'Reilly-Wapstra JM. Analysis of the transcriptome of the needles and bark of Pinus radiata induced by bark stripping and methyl jasmonate. BMC Genomics 2022; 23:52. [PMID: 35026979 PMCID: PMC8759178 DOI: 10.1186/s12864-021-08231-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Plants are attacked by diverse insect and mammalian herbivores and respond with different physical and chemical defences. Transcriptional changes underlie these phenotypic changes. Simulated herbivory has been used to study the transcriptional and other early regulation events of these plant responses. In this study, constitutive and induced transcriptional responses to artificial bark stripping are compared in the needles and the bark of Pinus radiata to the responses from application of the plant stressor, methyl jasmonate. The time progression of the responses was assessed over a 4-week period. RESULTS Of the 6312 unique transcripts studied, 86.6% were differentially expressed between the needles and the bark prior to treatment. The most abundant constitutive transcripts were related to defence and photosynthesis and their expression did not differ between the needles and the bark. While no differential expression of transcripts were detected in the needles following bark stripping, in the bark this treatment caused an up-regulation and down-regulation of genes associated with primary and secondary metabolism. Methyl jasmonate treatment caused differential expression of transcripts in both the bark and the needles, with individual genes related to primary metabolism more responsive than those associated with secondary metabolism. The up-regulation of genes related to sugar break-down and the repression of genes related with photosynthesis, following both treatments was consistent with the strong down-regulation of sugars that has been observed in the same population. Relative to the control, the treatments caused a differential expression of genes involved in signalling, photosynthesis, carbohydrate and lipid metabolism as well as defence and water stress. However, non-overlapping transcripts were detected between the needles and the bark, between treatments and at different times of assessment. Methyl jasmonate induced more transcriptional responses in the bark than bark stripping, although the peak of expression following both treatments was detected 7 days post treatment application. The effects of bark stripping were localised, and no systemic changes were detected in the needles. CONCLUSION There are constitutive and induced differences in the needle and bark transcriptome of Pinus radiata. Some expression responses to bark stripping may differ from other biotic and abiotic stresses, which contributes to the understanding of plant molecular responses to diverse stresses. Whether the gene expression changes are heritable and how they differ between resistant and susceptible families identified in earlier studies needs further investigation.
Collapse
Affiliation(s)
- J S Nantongo
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia.
- National Forestry Resources Research Institute, Mukono, Uganda.
| | - B M Potts
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia
- ARC Training Centre for Forest Value, Hobart, Tasmania, Australia
| | | | | | | | - H Fitzgerald
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia
| | - J M O'Reilly-Wapstra
- School of Natural Sciences, University of Tasmania, Private Bag 5, Hobart, Tasmania, 7001, Australia
- ARC Training Centre for Forest Value, Hobart, Tasmania, Australia
| |
Collapse
|
20
|
Martini JWR, Gao N, Crossa J. Incorporating Omics Data in Genomic Prediction. Methods Mol Biol 2022; 2467:341-357. [PMID: 35451782 DOI: 10.1007/978-1-0716-2205-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this chapter, we discuss the motivation for integrating other types of omics data into genomic prediction methods. We give an overview of literature investigating the performance of omics-enhanced predictions, and highlight potential pitfalls when applying these methods in breeding. We emphasize that the statistical methods available for genomic data can be transferred to the general omics case. However, when using a framework of omic relationship matrices, the standardization of the variables may be more relevant than it is for a genomic relationship matrix based on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Johannes W R Martini
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico.
| | - Ning Gao
- School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Veracruz, CP, Mexico
| |
Collapse
|
21
|
Wang J, Guan J, Yixi K, Shu T, Chai Z, Wang J, Wang H, Wu Z, Cai X, Zhong J, Luo X. Comparative transcriptome analysis of winter yaks in plateau and plain. Reprod Domest Anim 2021; 57:64-71. [PMID: 34695258 DOI: 10.1111/rda.14029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 10/11/2021] [Indexed: 11/29/2022]
Abstract
The yak is an important source for the people living and ecological environment in the Qinghai-Tibet Plateau. In every winter, many domestic yaks will lose bodyweight or dead under cold and food scarcity. Moving the plateau yaks to farm in the plain is a useful approach to reduce their environmental stress and gain more production. In this study, we measured growth, slaughter and beef quality traits every month and sequenced mRNA expression levels of muscles of two groups yaks living in plateau and plain respectively. We found there is significant difference (p-value <0.01) in the third (60 days), fourth (90 days), fifth (120 days) and sixth (150 days) weights between subpopulations in the plateau and plain. We identified 540 different expressed genes (DEGs) including 123 known genes and 417 unknown genes. Using the weighted correlation network analysis (WGCNA) to build a co-express network, the modules were strong relative to weight traits. The findings highlighted the underlying way and a relative network to yield a new view about gene expression between the yaks living plateau and plain.
Collapse
Affiliation(s)
- Jiabo Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Jiuqiang Guan
- Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Kangzhu Yixi
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Tao Shu
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China
| | - Zhixin Chai
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Jikun Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Hui Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Zhijuan Wu
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Xin Cai
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Jincheng Zhong
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, China.,Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization Key Laboratory of Sichuan Province, Chengdu, China
| | - Xiaolin Luo
- Sichuan Academy of Grassland Sciences, Chengdu, China
| |
Collapse
|
22
|
Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize. PLoS Genet 2021; 17:e1009568. [PMID: 34606492 PMCID: PMC8516254 DOI: 10.1371/journal.pgen.1009568] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 10/14/2021] [Accepted: 09/07/2021] [Indexed: 11/19/2022] Open
Abstract
Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels–a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)–for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations. Genomic marker data is widely used in the prediction of many traits. However, prediction has been primarily carried out within populations and without explicit modeling of RNA or protein expression. In this study, we explored the prediction of field traits within and across populations using estimated RNA expression attributable to only the DNA sequence around a gene. We showed that the estimated RNA expression was more transferable across populations and tissues than measured RNA expression. We improved prediction of field traits up to 15% using estimated gene expression as compared to observed expression or gene sequence alone. Overall, these findings indicate that structural and functional information in the gene sequence is highly transferable.
Collapse
|
23
|
Pazhamala LT, Kudapa H, Weckwerth W, Millar AH, Varshney RK. Systems biology for crop improvement. THE PLANT GENOME 2021; 14:e20098. [PMID: 33949787 DOI: 10.1002/tpg2.20098] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 03/09/2021] [Indexed: 05/19/2023]
Abstract
In recent years, generation of large-scale data from genome, transcriptome, proteome, metabolome, epigenome, and others, has become routine in several plant species. Most of these datasets in different crop species, however, were studied independently and as a result, full insight could not be gained on the molecular basis of complex traits and biological networks. A systems biology approach involving integration of multiple omics data, modeling, and prediction of the cellular functions is required to understand the flow of biological information that underlies complex traits. In this context, systems biology with multiomics data integration is crucial and allows a holistic understanding of the dynamic system with the different levels of biological organization interacting with external environment for a phenotypic expression. Here, we present recent progress made in the area of various omics studies-integrative and systems biology approaches with a special focus on application to crop improvement. We have also discussed the challenges and opportunities in multiomics data integration, modeling, and understanding of the biology of complex traits underpinning yield and stress tolerance in major cereals and legumes.
Collapse
Affiliation(s)
- Lekha T Pazhamala
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Himabindu Kudapa
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria
- Vienna Metabolomics Center, University of Vienna, Vienna, Austria
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology and School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
- State Agricultural Biotechnology Centre, Crop Research Innovation Centre, Food Futures Institute, Murdoch University, Murdoch, WA, Australia
| |
Collapse
|
24
|
Rychkov D, Neely J, Oskotsky T, Yu S, Perlmutter N, Nititham J, Carvidi A, Krueger M, Gross A, Criswell LA, Ashouri JF, Sirota M. Cross-Tissue Transcriptomic Analysis Leveraging Machine Learning Approaches Identifies New Biomarkers for Rheumatoid Arthritis. Front Immunol 2021; 12:638066. [PMID: 34177888 PMCID: PMC8223752 DOI: 10.3389/fimmu.2021.638066] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/17/2021] [Indexed: 01/20/2023] Open
Abstract
There is an urgent need to identify biomarkers for diagnosis and disease activity monitoring in rheumatoid arthritis (RA). We leveraged publicly available microarray gene expression data in the NCBI GEO database for whole blood (N=1,885) and synovial (N=284) tissues from RA patients and healthy controls. We developed a robust machine learning feature selection pipeline with validation on five independent datasets culminating in 13 genes: TNFAIP6, S100A8, TNFSF10, DRAM1, LY96, QPCT, KYNU, ENTPD1, CLIC1, ATP6V0E1, HSP90AB1, NCL and CIRBP which define the RA score and demonstrate its clinical utility: the score tracks the disease activity DAS28 (p = 7e-9), distinguishes osteoarthritis (OA) from RA (OR 0.57, p = 8e-10) and polyJIA from healthy controls (OR 1.15, p = 2e-4) and monitors treatment effect in RA (p = 2e-4). Finally, the immunoblotting analysis of six proteins on an independent cohort confirmed two proteins, TNFAIP6/TSG6 and HSP90AB1/HSP90.
Collapse
Affiliation(s)
- Dmitry Rychkov
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, United States
- Department of Surgery, University of California San Francisco, San Francisco, CA, United States
- Department of Pediatrics, University of California San Francisco, San Francisco, CA, United States
| | - Jessica Neely
- Department of Pediatrics, University of California San Francisco, San Francisco, CA, United States
| | - Tomiko Oskotsky
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, United States
| | - Steven Yu
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, United States
| | - Noah Perlmutter
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Joanne Nititham
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Alexander Carvidi
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Melissa Krueger
- Department of Medicine, Oregon Health & Science University, Portland, OR, United States
| | - Andrew Gross
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Lindsey A. Criswell
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- Institute for Human Genetics (IHG), University of California San Francisco, San Francisco, CA, United States
- Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- Department of Orofacial Sciences, University of California San Francisco, San Francisco, CA, United States
| | - Judith F. Ashouri
- Rosalind Russell/Ephraim P. Engleman Rheumatology Research Center, Division of Rheumatology, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, United States
- Department of Pediatrics, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
25
|
Rice BR, Lipka AE. Diversifying maize genomic selection models. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:33. [PMID: 37309328 PMCID: PMC10236107 DOI: 10.1007/s11032-021-01221-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/07/2021] [Indexed: 06/14/2023]
Abstract
Genomic selection (GS) is one of the most powerful tools available for maize breeding. Its use of genome-wide marker data to estimate breeding values translates to increased genetic gains with fewer breeding cycles. In this review, we cover the history of GS and highlight particular milestones during its adaptation to maize breeding. We discuss how GS can be applied to developing superior maize inbreds and hybrids. Additionally, we characterize refinements in GS models that could enable the encapsulation of non-additive genetic effects, genotype by environment interactions, and multiple levels of the biological hierarchy, all of which could ultimately result in more accurate predictions of breeding values. Finally, we suggest the stages in a maize breeding program where it would be beneficial to apply GS. Given the current sophistication of high-throughput phenotypic, genotypic, and other -omic level data currently available to the maize community, now is the time to explore the implications of their incorporation into GS models and thus ensure that genetic gains are being achieved as quickly and efficiently as possible.
Collapse
Affiliation(s)
- Brian R. Rice
- Department of Crop Sciences, University of Illinois, Urbana, IL USA
| | | |
Collapse
|
26
|
Campbell MT, Hu H, Yeats TH, Brzozowski LJ, Caffe-Treml M, Gutiérrez L, Smith KP, Sorrells ME, Gore MA, Jannink JL. Improving Genomic Prediction for Seed Quality Traits in Oat (Avena sativa L.) Using Trait-Specific Relationship Matrices. Front Genet 2021; 12:643733. [PMID: 33868378 PMCID: PMC8044359 DOI: 10.3389/fgene.2021.643733] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/04/2021] [Indexed: 11/13/2022] Open
Abstract
The observable phenotype is the manifestation of information that is passed along different organization levels (transcriptional, translational, and metabolic) of a biological system. The widespread use of various omic technologies (RNA-sequencing, metabolomics, etc.) has provided plant genetics and breeders with a wealth of information on pertinent intermediate molecular processes that may help explain variation in conventional traits such as yield, seed quality, and fitness, among others. A major challenge is effectively using these data to help predict the genetic merit of new, unobserved individuals for conventional agronomic traits. Trait-specific genomic relationship matrices (TGRMs) model the relationships between individuals using genome-wide markers (SNPs) and place greater emphasis on markers that most relevant to the trait compared to conventional genomic relationship matrices. Given that these approaches define relationships based on putative causal loci, it is expected that these approaches should improve predictions for related traits. In this study we evaluated the use of TGRMs to accommodate information on intermediate molecular phenotypes (referred to as endophenotypes) and to predict an agronomic trait, total lipid content, in oat seed. Nine fatty acids were quantified in a panel of 336 oat lines. Marker effects were estimated for each endophenotype, and were used to construct TGRMs. A multikernel TRGM model (MK-TRGM-BLUP) was used to predict total seed lipid content in an independent panel of 210 oat lines. The MK-TRGM-BLUP approach significantly improved predictions for total lipid content when compared to a conventional genomic BLUP (gBLUP) approach. Given that the MK-TGRM-BLUP approach leverages information on the nine fatty acids to predict genetic values for total lipid content in unobserved individuals, we compared the MK-TGRM-BLUP approach to a multi-trait gBLUP (MT-gBLUP) approach that jointly fits phenotypes for fatty acids and total lipid content. The MK-TGRM-BLUP approach significantly outperformed MT-gBLUP. Collectively, these results highlight the utility of using TGRM to accommodate information on endophenotypes and improve genomic prediction for a conventional agronomic trait.
Collapse
Affiliation(s)
- Malachy T. Campbell
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Haixiao Hu
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Trevor H. Yeats
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Lauren J. Brzozowski
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Melanie Caffe-Treml
- Seed Technology Lab 113, Agronomy, Horticulture & Plant Science, South Dakota State University, Brookings, SD, United States
| | - Lucía Gutiérrez
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI, United States
| | - Kevin P. Smith
- Department of Agronomy & Plant Genetics, University of Minnesota, St. Paul, MN, United States
| | - Mark E. Sorrells
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Michael A. Gore
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jean-Luc Jannink
- Plant Breeding & Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
- R.W. Holley Center for Agriculture & Health, US Department of Agriculture, Agricultural Research Service, Ithaca, NY, United States
| |
Collapse
|
27
|
Baba T, Pegolo S, Mota LFM, Peñagaricano F, Bittante G, Cecchinato A, Morota G. Integrating genomic and infrared spectral data improves the prediction of milk protein composition in dairy cattle. Genet Sel Evol 2021; 53:29. [PMID: 33726672 PMCID: PMC7968271 DOI: 10.1186/s12711-021-00620-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Accepted: 03/01/2021] [Indexed: 11/20/2022] Open
Abstract
Background Over the past decade, Fourier transform infrared (FTIR) spectroscopy has been used to predict novel milk protein phenotypes. Genomic data might help predict these phenotypes when integrated with milk FTIR spectra. The objective of this study was to investigate prediction accuracy for milk protein phenotypes when heterogeneous on-farm, genomic, and pedigree data were integrated with the spectra. To this end, we used the records of 966 Italian Brown Swiss cows with milk FTIR spectra, on-farm information, medium-density genetic markers, and pedigree data. True and total whey protein, and five casein, and two whey protein traits were analyzed. Multiple kernel learning constructed from spectral and genomic (pedigree) relationship matrices and multilayer BayesB assigning separate priors for FTIR and markers were benchmarked against a baseline partial least squares (PLS) regression. Seven combinations of covariates were considered, and their predictive abilities were evaluated by repeated random sub-sampling and herd cross-validations (CV). Results Addition of the on-farm effects such as herd, days in milk, and parity to spectral data improved predictions as compared to those obtained using the spectra alone. Integrating genomics and/or the top three markers with a large effect further enhanced the predictions. Pedigree data also improved prediction, but to a lesser extent than genomic data. Multiple kernel learning and multilayer BayesB increased predictive performance, whereas PLS did not. Overall, multilayer BayesB provided better predictions than multiple kernel learning, and lower prediction performance was observed in herd CV compared to repeated random sub-sampling CV. Conclusions Integration of genomic information with milk FTIR spectral can enhance milk protein trait predictions by 25% and 7% on average for repeated random sub-sampling and herd CV, respectively. Multiple kernel learning and multilayer BayesB outperformed PLS when used to integrate heterogeneous data for phenotypic predictions.
Collapse
Affiliation(s)
- Toshimi Baba
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Sara Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy.
| | - Lucio F M Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy
| | - Francisco Peñagaricano
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Giovanni Bittante
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy
| | - Alessio Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell'Università 16, 35020, Legnaro, Italy
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Center for Advanced Innovation in Agriculture, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.
| |
Collapse
|
28
|
Gonçalves MTV, Morota G, Costa PMDA, Vidigal PMP, Barbosa MHP, Peternelli LA. Near-infrared spectroscopy outperforms genomics for predicting sugarcane feedstock quality traits. PLoS One 2021; 16:e0236853. [PMID: 33661948 PMCID: PMC7932073 DOI: 10.1371/journal.pone.0236853] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 01/20/2021] [Indexed: 11/19/2022] Open
Abstract
The main objectives of this study were to evaluate the prediction performance of genomic and near-infrared spectroscopy (NIR) data and whether the integration of genomic and NIR predictor variables can increase the prediction accuracy of two feedstock quality traits (fiber and sucrose content) in a sugarcane population (Saccharum spp.). The following three modeling strategies were compared: M1 (genome-based prediction), M2 (NIR-based prediction), and M3 (integration of genomics and NIR wavenumbers). Data were collected from a commercial population comprised of three hundred and eighty-five individuals, genotyped for single nucleotide polymorphisms and screened using NIR spectroscopy. We compared partial least squares (PLS) and BayesB regression methods to estimate marker and wavenumber effects. In order to assess model performance, we employed random sub-sampling cross-validation to calculate the mean Pearson correlation coefficient between observed and predicted values. Our results showed that models fitted using BayesB were more predictive than PLS models. We found that NIR (M2) provided the highest prediction accuracy, whereas genomics (M1) presented the lowest predictive ability, regardless of the measured traits and regression methods used. The integration of predictors derived from NIR spectroscopy and genomics into a single model (M3) did not significantly improve the prediction accuracy for the two traits evaluated. These findings suggest that NIR-based prediction can be an effective strategy for predicting the genetic merit of sugarcane clones.
Collapse
Affiliation(s)
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States of America
| | | | | | | | | |
Collapse
|
29
|
Morgante F, Huang W, Sørensen P, Maltecca C, Mackay TFC. Leveraging Multiple Layers of Data To Predict Drosophila Complex Traits. G3 (BETHESDA, MD.) 2020; 10:4599-4613. [PMID: 33106232 PMCID: PMC7718734 DOI: 10.1534/g3.120.401847] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 10/12/2020] [Indexed: 02/07/2023]
Abstract
The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.
Collapse
Affiliation(s)
- Fabio Morgante
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
| | - Wen Huang
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
| | - Peter Sørensen
- Center of Quantitative Genetics and Genomics and Department of Molecular Biology and Genetics, Aarhus University, Tjele 8830, Denmark
| | - Christian Maltecca
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
- Department of Animal Science, North Carolina State University, Raleigh, NC 27695
| | - Trudy F C Mackay
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
- Program in Genetics, North Carolina State University, Raleigh, NC 27695
| |
Collapse
|
30
|
Ye S, Li J, Zhang Z. Multi-omics-data-assisted genomic feature markers preselection improves the accuracy of genomic prediction. J Anim Sci Biotechnol 2020; 11:109. [PMID: 33292577 PMCID: PMC7708144 DOI: 10.1186/s40104-020-00515-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 09/22/2020] [Indexed: 12/02/2022] Open
Abstract
Background Presently, multi-omics data (e.g., genomics, transcriptomics, proteomics, and metabolomics) are available to improve genomic predictors. Omics data not only offers new data layers for genomic prediction but also provides a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level. Therefore, using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction. In this study, simultaneously using whole-genome sequencing (WGS) and gene expression level data, four strategies for single-nucleotide polymorphism (SNP) preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel. Results Using genomic best linear unbiased prediction (GBLUP) with complete WGS data, the prediction accuracies were 0.208 ± 0.020 (0.181 ± 0.022) for the startle response and 0.272 ± 0.017 (0.307 ± 0.015) for starvation resistance in the female (male) lines. Compared with GBLUP using complete WGS data, both GBLUP and the genomic feature BLUP (GFBLUP) did not improve the prediction accuracy using SNPs preselected from complete WGS data based on the results of genome-wide association studies (GWASs) or transcriptome-wide association studies (TWASs). Furthermore, by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus (eQTL) mapping of all genes, only the startle response had greater accuracy than GBLUP with the complete WGS data. The best accuracy values in the female and male lines were 0.243 ± 0.020 and 0.220 ± 0.022, respectively. Importantly, by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS, both GBLUP and GFBLUP resulted in great accuracy and small bias of genomic prediction. Compared with the GBLUP using complete WGS data, the best accuracy values represented increases of 60.66% and 39.09% for the starvation resistance and 27.40% and 35.36% for startle response in the female and male lines, respectively. Conclusions Overall, multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction. The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong, China.
| |
Collapse
|
31
|
Pook T, Freudenthal J, Korte A, Simianer H. Using Local Convolutional Neural Networks for Genomic Prediction. Front Genet 2020; 11:561497. [PMID: 33281867 PMCID: PMC7689358 DOI: 10.3389/fgene.2020.561497] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 10/12/2020] [Indexed: 11/18/2022] Open
Abstract
The prediction of breeding values and phenotypes is of central importance for both livestock and crop breeding. In this study, we analyze the use of artificial neural networks (ANN) and, in particular, local convolutional neural networks (LCNN) for genomic prediction, as a region-specific filter corresponds much better with our prior genetic knowledge on the genetic architecture of traits than traditional convolutional neural networks. Model performances are evaluated on a simulated maize data panel (n = 10,000; p = 34,595) and real Arabidopsis data (n = 2,039; p = 180,000) for a variety of traits based on their predictive ability. The baseline LCNN, containing one local convolutional layer (kernel size: 10) and two fully connected layers with 64 nodes each, is outperforming commonly proposed ANNs (multi layer perceptrons and convolutional neural networks) for basically all considered traits. For traits with high heritability and large training population as present in the simulated data, LCNN are even outperforming state-of-the-art methods like genomic best linear unbiased prediction (GBLUP), Bayesian models and extended GBLUP, indicated by an increase in predictive ability of up to 24%. However, for small training populations, these state-of-the-art methods outperform all considered ANNs. Nevertheless, the LCNN still outperforms all other considered ANNs by around 10%. Minor improvements to the tested baseline network architecture of the LCNN were obtained by increasing the kernel size and of reducing the stride, whereas the number of subsequent fully connected layers and their node sizes had neglectable impact. Although gains in predictive ability were obtained for large scale data sets by using LCNNs, the practical use of ANNs comes with additional problems, such as the need of genotyping all considered individuals, the lack of estimation of heritability and reliability. Furthermore, breeding values are additive by design, whereas ANN-based estimates are not. However, ANNs also comes with new opportunities, as networks can easily be extended to account for additional inputs (omics, weather etc.) and outputs (multi-trait models), and computing time increases linearly with the number of individuals. With advances in high-throughput phenotyping and cheaper genotyping, ANNs can become a valid alternative for genomic prediction.
Collapse
Affiliation(s)
- Torsten Pook
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, Germany
| | - Jan Freudenthal
- Center for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Wuerzburg, Wuerzburg, Germany
| | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Goettingen, Göttingen, Germany
| |
Collapse
|
32
|
A novel computational approach for predicting complex phenotypes in Drosophila (starvation-sensitive and sterile) by deriving their gene expression signatures from public data. PLoS One 2020; 15:e0240824. [PMID: 33104720 PMCID: PMC7588067 DOI: 10.1371/journal.pone.0240824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 10/05/2020] [Indexed: 11/19/2022] Open
Abstract
Many research teams perform numerous genetic, transcriptomic, proteomic and other types of omic experiments to understand molecular, cellular and physiological mechanisms of disease and health. Often (but not always), the results of these experiments are deposited in publicly available repository databases. These data records often include phenotypic characteristics following genetic and environmental perturbations, with the aim of discovering underlying molecular mechanisms leading to the phenotypic responses. A constrained set of phenotypic characteristics is usually recorded and these are mostly hypothesis driven of possible to record within financial or practical constraints. We present a novel proof-of-principal computational approach for combining publicly available gene-expression data from control/mutant animal experiments that exhibit a particular phenotype, and we use this approach to predict unobserved phenotypic characteristics in new experiments (data derived from EBI’s ArrayExpress and ExpressionAtlas respectively). We utilised available microarray gene-expression data for two phenotypes (starvation-sensitive and sterile) in Drosophila. The data were combined using a linear-mixed effects model with the inclusion of consecutive principal components to account for variability between experiments in conjunction with Gene Ontology enrichment analysis. We present how available data can be ranked in accordance to a phenotypic likelihood of exhibiting these two phenotypes using random forest. The results from our study show that it is possible to integrate seemingly different gene-expression microarray data and predict a potential phenotypic manifestation with a relatively high degree of confidence (>80% AUC). This provides thus far unexplored opportunities for inferring unknown and unbiased phenotypic characteristics from already performed experiments, in order to identify studies for future analyses. Molecular mechanisms associated with gene and environment perturbations are intrinsically linked and give rise to a variety of phenotypic manifestations. Therefore, unravelling the phenotypic spectrum can help to gain insights into disease mechanisms associated with gene and environmental perturbations. Our approach uses public data that are set to increase in volume, thus providing value for money.
Collapse
|
33
|
Xu L, Gao N, Wang Z, Xu L, Liu Y, Chen Y, Xu L, Gao X, Zhang L, Gao H, Zhu B, Li J. Incorporating Genome Annotation Into Genomic Prediction for Carcass Traits in Chinese Simmental Beef Cattle. Front Genet 2020; 11:481. [PMID: 32499816 PMCID: PMC7243208 DOI: 10.3389/fgene.2020.00481] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 04/17/2020] [Indexed: 01/08/2023] Open
Abstract
Various methods have been proposed for genomic prediction (GP) in livestock. These methods have mainly focused on statistical considerations and did not include genome annotation information. In this study, to improve the predictive performance of carcass traits in Chinese Simmental beef cattle, we incorporated the genome annotation information into GP. Single nucleotide polymorphisms (SNPs) were annotated to five genomic classes: intergenic, gene, exon, protein coding sequences, and 3'/5' untranslated region. Haploblocks were constructed for all markers and these five genomic classes by defining a biologically functional unit, and haplotype effects were modeled in both numerical dosage and categorical coding strategies. The first-order epistatic effects among SNPs and haplotypes were modeled using a categorical epistasis model. For all makers, the extension from the SNP-based model to a haplotype-based model improved the accuracy by 5.4-9.8% for carcass weight (CW), live weight (LW), and striploin (SI). For the five genomic classes using the haplotype-based prediction model, the incorporation of gene class information into the model improved the accuracies by an average of 1.4, 2.1, and 1.3% for CW, LW, and SI, respectively, compared with their corresponding results for all markers. Including the first-order epistatic effects into the prediction models improved the accuracies in some traits and genomic classes. Therefore, for traits with moderate-to-high heritability, incorporating genome annotation information of gene class into haplotype-based prediction models could be considered as a promising tool for GP in Chinese Simmental beef cattle, and modeling epistasis in prediction can further increase the accuracy to some degree.
Collapse
Affiliation(s)
- Ling Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ning Gao
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zezhao Wang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lei Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ying Liu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Bo Zhu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- National Centre of Beef Cattle Genetic Evaluation, Beijing, China
| |
Collapse
|
34
|
Azodi CB, Pardo J, VanBuren R, de Los Campos G, Shiu SH. Transcriptome-Based Prediction of Complex Traits in Maize. THE PLANT CELL 2020; 32:139-151. [PMID: 31641024 PMCID: PMC6961623 DOI: 10.1105/tpc.19.00332] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/24/2019] [Accepted: 10/21/2019] [Indexed: 05/11/2023]
Abstract
The ability to predict traits from genome-wide sequence information (i.e., genomic prediction) has improved our understanding of the genetic basis of complex traits and transformed breeding practices. Transcriptome data may also be useful for genomic prediction. However, it remains unclear how well transcript levels can predict traits, particularly when traits are scored at different development stages. Using maize (Zea mays) genetic markers and transcript levels from seedlings to predict mature plant traits, we found that transcript and genetic marker models have similar performance. When the transcripts and genetic markers with the greatest weights (i.e., the most important) in those models were used in one joint model, performance increased. Furthermore, genetic markers important for predictions were not close to or identified as regulatory variants for important transcripts. These findings demonstrate that transcript levels are useful for predicting traits and that their predictive power is not simply due to genetic variation in the transcribed genomic regions. Finally, genetic marker models identified only 1 of 14 benchmark flowering-time genes, while transcript models identified 5. These data highlight that, in addition to being useful for genomic prediction, transcriptome data can provide a link between traits and variation that cannot be readily captured at the sequence level.
Collapse
Affiliation(s)
- Christina B Azodi
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- The DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, 48824
| | - Jeremy Pardo
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan 48824
| | - Robert VanBuren
- Plant Resilience Institute, Michigan State University, East Lansing, Michigan 48824
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| | - Gustavo de Los Campos
- Epidemiology and Biostatistics and Statistics and Probability Departments, Michigan State University, East Lansing, Michigan 48824
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
- The DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, 48824
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|