1
|
Washburn JD, Varela JI, Xavier A, Chen Q, Ertl D, Gage JL, Holland JB, Lima DC, Romay MC, Lopez-Cruz M, de los Campos G, Barber W, Zimmer C, Silva IT, Rocha F, Rincent R, Ali B, Hu H, Runcie DE, Gusev K, Slabodkin A, Bax P, Aubert J, Gangloff H, Mary-Huard T, Vanrenterghem T, Quesada-Traver C, Yates S, Ariza-Suárez D, Ulrich A, Wyler M, Kick DR, Bellis ES, Causey JL, Chavez ES, Wang Y, Piyush V, Fernando GD, Hu RK, Kumar R, Timon AJ, Venkatesh R, Abá KS, Chen H, Ranaweera T, Shiu SH, Wang P, Gordon MJ, Amos BK, Busato S, Perondi D, Gogna A, Psaroudakis D, Chen CPJ, Al-Mamun HA, Danilevicz MF, Upadhyaya SR, Edwards D, de Leon N. Global Genotype by Environment Prediction Competition Reveals That Diverse Modeling Strategies Can Deliver Satisfactory Maize Yield Estimates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.13.612969. [PMID: 39345633 PMCID: PMC11429743 DOI: 10.1101/2024.09.13.612969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Predicting phenotypes from a combination of genetic and environmental factors is a grand challenge of modern biology. Slight improvements in this area have the potential to save lives, improve food and fuel security, permit better care of the planet, and create other positive outcomes. In 2022 and 2023 the first open-to-the-public Genomes to Fields (G2F) initiative Genotype by Environment (GxE) prediction competition was held using a large dataset including genomic variation, phenotype and weather measurements and field management notes, gathered by the project over nine years. The competition attracted registrants from around the world with representation from academic, government, industry, and non-profit institutions as well as unaffiliated. These participants came from diverse disciplines include plant science, animal science, breeding, statistics, computational biology and others. Some participants had no formal genetics or plant-related training, and some were just beginning their graduate education. The teams applied varied methods and strategies, providing a wealth of modeling knowledge based on a common dataset. The winner's strategy involved two models combining machine learning and traditional breeding tools: one model emphasized environment using features extracted by Random Forest, Ridge Regression and Least-squares, and one focused on genetics. Other high-performing teams' methods included quantitative genetics, classical machine learning/deep learning, mechanistic models, and model ensembles. The dataset factors used, such as genetics; weather; and management data, were also diverse, demonstrating that no single model or strategy is far superior to all others within the context of this competition.
Collapse
Affiliation(s)
- Jacob D. Washburn
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - José Ignacio Varela
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Alencar Xavier
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
- Department of Agronomy, Purdue University, 915 Mitch Daniels Blvd, West Lafayette, IN 47907, United States
| | - Qiuyue Chen
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA, 50131, USA
| | - Joseph L. Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - James B. Holland
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
- USDA-ARS Plant Science Research Unit, Raleigh, NC, 27695, USA
| | - Dayane Cristina Lima
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Marco Lopez-Cruz
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Gustavo de los Campos
- Departments of Epidemiology & Biostatistics and Statistics & Probability, and Institute for Quantitative Health Science and Engineering, Michigan State University, 775 Woodlot Dr., East Lansing, MI, 48823, USA
| | - Wesley Barber
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Cristiano Zimmer
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | | | - Fabiani Rocha
- Corteva Agrisciences, 8305 NW 62nd Ave, Johnston, IA, 50131, USA
| | - Renaud Rincent
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Baber Ali
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
| | - Haixiao Hu
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Daniel E Runcie
- Department of Plant Sciences, University of California Davis, One Shield Drive, Davis, CA, 95616, USA
| | - Kirill Gusev
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Andrei Slabodkin
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Phillip Bax
- Smart Agri Labs, 2055 Limestone Rd STE 200-C, Wilmington, DE, 19808, USA
| | - Julie Aubert
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Hugo Gangloff
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190 Gif-sur-Yvette, France
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Theodore Vanrenterghem
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Carles Quesada-Traver
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Steven Yates
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Daniel Ariza-Suárez
- Molecular Plant Breeding, Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zurich, Switzerland
| | - Argeo Ulrich
- Puregene AG, Etzmatt 273, CH-4314 Zeiningen, Switzerland
- Institute of Agricultural Sciences, ETH Zurich, Universitätstrasse 2, CH-8092 Zürich, Switzerland
| | - Michele Wyler
- MWSchmid GmbH, Hauptstrasse 34, CH-8750 Glarus, Switzerland
| | - Daniel R. Kick
- USDA-ARS-MWA-PGRU, 302-A Curtis Hall, U. of MO., Columbia, MO, 65211, USA
| | - Emily S. Bellis
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Jason L. Causey
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Emilio Soriano Chavez
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Yixing Wang
- Department of Computer Science, Arkansas State University, 2105 E. Aggie Rd., Jonesboro, AR, 72401, USA
| | - Ved Piyush
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Gayara D. Fernando
- Department of Statistics, University of Nebraska - Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, 68583, USA
| | - Robert K Hu
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rachit Kumar
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
- Medical Scientist Training Program, Perelman School of Medicine at the University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA
| | - Annan J. Timon
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, Perelman School of Medicine at the University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA
| | - Kenia Segura Abá
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Huan Chen
- Genetics and Genome Sciences Graduate Program, Michigan State University, East Lansing, MI, 48824, USA
| | - Thilanka Ranaweera
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Shin-Han Shiu
- DOE Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, MI, 48824, USA
- Department of Plant Biology, Michigan State University, East Lansing, MI, 48824, USA
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Peiran Wang
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Max J. Gordon
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - B K. Amos
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Sebastiano Busato
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Daniel Perondi
- NC Plant Science Initiative, North Carolina State University, 840 Oval Drive, Raleigh, NC, 27606, USA
- Department of Electrical and Computer Engineering, North Carolina State University, 890 Oval Dr, Raleigh, NC, 27606, USA
| | - Abhishek Gogna
- Department of Breeding Research, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - Dennis Psaroudakis
- Department of Molecular Genetics, Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung, Corrensstraße 3, Gatersleben, 6466, Germany
| | - C. P. James Chen
- School of Animal Sciences, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Hawlader A. Al-Mamun
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Monica F. Danilevicz
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Shriprabha R. Upadhyaya
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Centre of Applied Bioinformatics, University of Western Australia, Perth, WA, Australia
| | - Natalia de Leon
- Department of Plant and Agroecosystem Sciences, University of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| |
Collapse
|
2
|
Li J, Zhang D, Yang F, Zhang Q, Pan S, Zhao X, Zhang Q, Han Y, Yang J, Wang K, Zhao C. TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield. PLANT COMMUNICATIONS 2024; 5:100975. [PMID: 38751121 PMCID: PMC11287160 DOI: 10.1016/j.xplc.2024.100975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 04/14/2024] [Accepted: 05/11/2024] [Indexed: 06/24/2024]
Abstract
Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.
Collapse
Affiliation(s)
- Jinlong Li
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Dongfeng Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Feng Yang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Qiusi Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Shouhui Pan
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Xiangyu Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Qi Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Yanyun Han
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Kaiyi Wang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
| | - Chunjiang Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
| |
Collapse
|
3
|
Morales N, Anche MT, Kaczmar NS, Lepak N, Ni P, Romay MC, Santantonio N, Buckler ES, Gore MA, Mueller LA, Robbins KR. Spatio-temporal modeling of high-throughput multispectral aerial images improves agronomic trait genomic prediction in hybrid maize. Genetics 2024; 227:iyae037. [PMID: 38469622 PMCID: PMC11075545 DOI: 10.1093/genetics/iyae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 12/02/2023] [Accepted: 02/18/2024] [Indexed: 03/13/2024] Open
Abstract
Design randomizations and spatial corrections have increased understanding of genotypic, spatial, and residual effects in field experiments, but precisely measuring spatial heterogeneity in the field remains a challenge. To this end, our study evaluated approaches to improve spatial modeling using high-throughput phenotypes (HTP) via unoccupied aerial vehicle (UAV) imagery. The normalized difference vegetation index was measured by a multispectral MicaSense camera and processed using ImageBreed. Contrasting to baseline agronomic trait spatial correction and a baseline multitrait model, a two-stage approach was proposed. Using longitudinal normalized difference vegetation index data, plot level permanent environment effects estimated spatial patterns in the field throughout the growing season. Normalized difference vegetation index permanent environment were separated from additive genetic effects using 2D spline, separable autoregressive models, or random regression models. The Permanent environment were leveraged within agronomic trait genomic best linear unbiased prediction either modeling an empirical covariance for random effects, or by modeling fixed effects as an average of permanent environment across time or split among three growth phases. Modeling approaches were tested using simulation data and Genomes-to-Fields hybrid maize (Zea mays L.) field experiments in 2015, 2017, 2019, and 2020 for grain yield, grain moisture, and ear height. The two-stage approach improved heritability, model fit, and genotypic effect estimation compared to baseline models. Electrical conductance and elevation from a 2019 soil survey significantly improved model fit, while 2D spline permanent environment were most strongly correlated with the soil parameters. Simulation of field effects demonstrated improved specificity for random regression models. In summary, the use of longitudinal normalized difference vegetation index measurements increased experimental accuracy and understanding of field spatio-temporal heterogeneity.
Collapse
Affiliation(s)
- Nicolas Morales
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Mahlet T Anche
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Nicholas S Kaczmar
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Nicholas Lepak
- United States Department of Agriculture-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Pengzun Ni
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenhe District, Shenyang, Liaoning Province, PR China
| | - Maria Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Nicholas Santantonio
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Edward S Buckler
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- United States Department of Agriculture-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Lukas A Mueller
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
- Boyce Thompson Institute, Ithaca, NY 14853, USA
| | - Kelly R Robbins
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
4
|
Tuggle CK, Clarke JL, Murdoch BM, Lyons E, Scott NM, Beneš B, Campbell JD, Chung H, Daigle CL, Das Choudhury S, Dekkers JCM, Dórea JRR, Ertl DS, Feldman M, Fragomeni BO, Fulton JE, Guadagno CR, Hagen DE, Hess AS, Kramer LM, Lawrence-Dill CJ, Lipka AE, Lübberstedt T, McCarthy FM, McKay SD, Murray SC, Riggs PK, Rowan TN, Sheehan MJ, Steibel JP, Thompson AM, Thornton KJ, Van Tassell CP, Schnable PS. Current challenges and future of agricultural genomes to phenomes in the USA. Genome Biol 2024; 25:8. [PMID: 38172911 PMCID: PMC10763150 DOI: 10.1186/s13059-023-03155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 12/21/2023] [Indexed: 01/05/2024] Open
Abstract
Dramatic improvements in measuring genetic variation across agriculturally relevant populations (genomics) must be matched by improvements in identifying and measuring relevant trait variation in such populations across many environments (phenomics). Identifying the most critical opportunities and challenges in genome to phenome (G2P) research is the focus of this paper. Previously (Genome Biol, 23(1):1-11, 2022), we laid out how Agricultural Genome to Phenome Initiative (AG2PI) will coordinate activities with USA federal government agencies expand public-private partnerships, and engage with external stakeholders to achieve a shared vision of future the AG2PI. Acting on this latter step, AG2PI organized the "Thinking Big: Visualizing the Future of AG2PI" two-day workshop held September 9-10, 2022, in Ames, Iowa, co-hosted with the United State Department of Agriculture's National Institute of Food and Agriculture (USDA NIFA). During the meeting, attendees were asked to use their experience and curiosity to review the current status of agricultural genome to phenome (AG2P) work and envision the future of the AG2P field. The topic summaries composing this paper are distilled from two 1.5-h small group discussions. Challenges and solutions identified across multiple topics at the workshop were explored. We end our discussion with a vision for the future of agricultural progress, identifying two areas of innovation needed: (1) innovate in genetic improvement methods development and evaluation and (2) innovate in agricultural research processes to solve societal problems. To address these needs, we then provide six specific goals that we recommend be implemented immediately in support of advancing AG2P research.
Collapse
|
5
|
Lopez-Cruz M, Aguate FM, Washburn JD, de Leon N, Kaeppler SM, Lima DC, Tan R, Thompson A, De La Bretonne LW, de Los Campos G. Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America. Nat Commun 2023; 14:6904. [PMID: 37903778 PMCID: PMC10616096 DOI: 10.1038/s41467-023-42687-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 10/18/2023] [Indexed: 11/01/2023] Open
Abstract
Genotype-by-environment (G×E) interactions can significantly affect crop performance and stability. Investigating G×E requires extensive data sets with diverse cultivars tested over multiple locations and years. The Genomes-to-Fields (G2F) Initiative has tested maize hybrids in more than 130 year-locations in North America since 2014. Here, we curate and expand this data set by generating environmental covariates (using a crop model) for each of the trials. The resulting data set includes DNA genotypes and environmental data linked to more than 70,000 phenotypic records of grain yield and flowering traits for more than 4000 hybrids. We show how this valuable data set can serve as a benchmark in agricultural modeling and prediction, paving the way for countless G×E investigations in maize. We use multivariate analyses to characterize the data set's genetic and environmental structure, study the association of key environmental factors with traits, and provide benchmarks using genomic prediction models.
Collapse
Affiliation(s)
- Marco Lopez-Cruz
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| | - Fernando M Aguate
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, University of Missouri, Columbia, MO, 65211, USA
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, Madison, WI, 53706, USA
| | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin, Madison, WI, 53706, USA
- Wisconsin Crop Innovation Center, University of Wisconsin, Middleton, WI, 53562, USA
| | | | - Ruijuan Tan
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
| | - Addie Thompson
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI, 48824, USA
| | | | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
6
|
Khaipho-Burch M, Cooper M, Crossa J, de Leon N, Holland J, Lewis R, McCouch S, Murray SC, Rabbi I, Ronald P, Ross-Ibarra J, Weigel D, Buckler ES. Genetic modification can improve crop yields - but stop overselling it. Nature 2023; 621:470-473. [PMID: 37773222 PMCID: PMC11550184 DOI: 10.1038/d41586-023-02895-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
With a changing climate and a growing population, the world increasingly needs more-productive and resilient crops. But improving them requires a knowledge of what actually works in the field.
Collapse
|
7
|
Tolley SA, Brito LF, Wang DR, Tuinstra MR. Genomic prediction and association mapping of maize grain yield in multi-environment trials based on reaction norm models. Front Genet 2023; 14:1221751. [PMID: 37719703 PMCID: PMC10501150 DOI: 10.3389/fgene.2023.1221751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 08/15/2023] [Indexed: 09/19/2023] Open
Abstract
Genotype-by-environment interaction (GEI) is among the greatest challenges for maize breeding programs. Strong GEI limits both the prediction of genotype performance across variable environmental conditions and the identification of genomic regions associated with grain yield. Incorporating GEI into yield prediction models has been shown to improve prediction accuracy of yield; nevertheless, more work is needed to further understand this complex interaction across populations and environments. The main objectives of this study were to: 1) assess GEI in maize grain yield based on reaction norm models and predict hybrid performance across a gradient of environmental (EG) conditions and 2) perform a genome-wide association study (GWAS) and post-GWAS analyses for maize grain yield using data from 2014 to 2017 of the Genomes to Fields initiative hybrid trial. After quality control, 2,126 hybrids with genotypic and phenotypic data were assessed across 86 environments representing combinations of locations and years, although not all hybrids were evaluated in all environments. Heritability was greater in higher-yielding environments due to an increase in genetic variability in these environments in comparison to the low-yielding environments. GWAS was carried out for yield and five single nucleotide polymorphisms (SNPs) with the highest magnitude of effect were selected in each environment for follow-up analyses. Many candidate genes in proximity of selected SNPs have been previously reported with roles in stress response. Genomic prediction was performed to assess prediction accuracy of previously tested or untested hybrids in environments from a new growing season. Prediction accuracy was 0.34 for cross validation across years (CV0-Predicted EG) and 0.21 for cross validation across years with only untested hybrids (CV00-Predicted EG) when compared to Best Linear Unbiased Prediction (BLUPs) that did not utilize genotypic or environmental relationships. Prediction accuracy improved to 0.80 (CV0-Predicted EG) and 0.60 (CV00-Predicted EG) when compared to the whole-dataset model that used the genomic relationships and the environmental gradient of all environments in the study. These results identify regions of the genome for future selection to improve yield and a methodology to increase the number of hybrids evaluated across locations of a multi-environment trial through genomic prediction.
Collapse
Affiliation(s)
- Seth A. Tolley
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Luiz F. Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, United States
| | - Diane R. Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | | |
Collapse
|
8
|
Kick DR, Wallace JG, Schnable JC, Kolkman JM, Alaca B, Beissinger TM, Edwards J, Ertl D, Flint-Garcia S, Gage JL, Hirsch CN, Knoll JE, de Leon N, Lima DC, Moreta DE, Singh MP, Thompson A, Weldekidan T, Washburn JD. Yield prediction through integration of genetic, environment, and management data through deep learning. G3 (BETHESDA, MD.) 2023; 13:jkad006. [PMID: 36625555 PMCID: PMC10085787 DOI: 10.1093/g3journal/jkad006] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 07/28/2022] [Accepted: 12/23/2022] [Indexed: 01/11/2023]
Abstract
Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model's sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield-those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.
Collapse
Affiliation(s)
- Daniel R Kick
- United States Department of Agriculture, Agricultural Research Service Plant Genetics Research Unit, Columbia, MO 65211, USA
- Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Jason G Wallace
- Department of Crop & Soil Science, University of Georgia, Athens, GA 30602, USA
| | - James C Schnable
- Center for Plant Science Innovation and Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Judith M Kolkman
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Barış Alaca
- Division of Plant Breeding Methodology, Department of Crop Science, University of Goettingen, Goettingen 37073, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen 37073, Germany
| | - Timothy M Beissinger
- Division of Plant Breeding Methodology, Department of Crop Science, University of Goettingen, Goettingen 37073, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen 37073, Germany
| | - Jode Edwards
- United States Department of Agriculture, Agricultural Research Service, Ames, IA 50011, USA
| | - David Ertl
- Research and Business Development, Iowa Corn Promotion Board, Johnston, IA 50131, USA
| | - Sherry Flint-Garcia
- United States Department of Agriculture, Agricultural Research Service Plant Genetics Research Unit, Columbia, MO 65211, USA
| | - Joseph L Gage
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Joseph E Knoll
- United States Department of Agriculture, Agricultural Research Service Crop Genetics and Breeding Research Unit, Tifton, GA 31793, USA
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, Madison, WI 53706, USA
| | - Dayane C Lima
- Plant Breeding and Plant Genetics Program, University of Wisconsin, Madison, WI 53706, USA
| | - Danilo E Moreta
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Maninder P Singh
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Addie Thompson
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | | | - Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service Plant Genetics Research Unit, Columbia, MO 65211, USA
- Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
9
|
Yu S, Kusmec AM, Wang L, Nettleton D. Fusion Learning of Functional Linear Regression with Application to Genotype-by-Environment Interaction Studies. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2023. [DOI: 10.1007/s13253-023-00529-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
10
|
Adak A, Murray SC, Anderson SL. Temporal phenomic predictions from unoccupied aerial systems can outperform genomic predictions. G3 (BETHESDA, MD.) 2022; 13:6851143. [PMID: 36445027 PMCID: PMC9836347 DOI: 10.1093/g3journal/jkac294] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022]
Abstract
A major challenge of genetic improvement and selection is to accurately predict individuals with the highest fitness in a population without direct measurement. Over the last decade, genomic predictions (GP) based on genome-wide markers have become reliable and routine. Now phenotyping technologies, including unoccupied aerial systems (UAS also known as drones), can characterize individuals with a data depth comparable to genomics when used throughout growth. This study, for the first time, demonstrated that the prediction power of temporal UAS phenomic data can achieve or exceed that of genomic data. UAS data containing red-green-blue (RGB) bands over 15 growth time points and multispectral (RGB, red-edge and near infrared) bands over 12 time points were compared across 280 unique maize hybrids. Through cross-validation of untested genotypes in tested environments (CV2), temporal phenomic prediction (TPP), outperformed GP (0.80 vs 0.71); TPP and GP performed similarly in 3 other cross-validation scenarios. Genome-wide association mapping using area under temporal curves of vegetation indices (VIs) revealed 24.5% of a total of 241 discovered loci (59 loci) had associations with multiple VIs, explaining up to 51% of grain yield variation, less than GP and TPP predicted. This suggests TPP, like GP, integrates small effect loci well improving plant fitness predictions. More importantly, TPP appeared to work successfully on unrelated individuals unlike GP.
Collapse
Affiliation(s)
- Alper Adak
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA
| | - Seth C Murray
- Corresponding author: Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843-2474, USA.
| | | |
Collapse
|
11
|
Westhues CC, Simianer H, Beissinger TM. learnMET: an R package to apply machine learning methods for genomic prediction using multi-environment trial data. G3 GENES|GENOMES|GENETICS 2022; 12:6705235. [PMID: 36124944 PMCID: PMC9635651 DOI: 10.1093/g3journal/jkac226] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 07/29/2022] [Indexed: 12/04/2022]
Abstract
We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or to retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated over specific periods of time based on naive (for instance, nonoverlapping 10-day windows) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient-boosted decision trees, random forests, stacked ensemble models, and multilayer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with multi-environment trial experimental data in a user-friendly way. The package is published under an MIT license and accessible on GitHub.
Collapse
Affiliation(s)
- Cathy C Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen , 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
| | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Gottingen , 37075 Gottingen, Germany
| | - Timothy M Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen , 37075 Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen , 37075 Goettingen, Germany
| |
Collapse
|
12
|
Mural RV, Sun G, Grzybowski M, Tross MC, Jin H, Smith C, Newton L, Andorf CM, Woodhouse MR, Thompson AM, Sigmon B, Schnable JC. Association mapping across a multitude of traits collected in diverse environments in maize. Gigascience 2022; 11:giac080. [PMID: 35997208 PMCID: PMC9396454 DOI: 10.1093/gigascience/giac080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/25/2022] [Indexed: 11/14/2022] Open
Abstract
Classical genetic studies have identified many cases of pleiotropy where mutations in individual genes alter many different phenotypes. Quantitative genetic studies of natural genetic variants frequently examine one or a few traits, limiting their potential to identify pleiotropic effects of natural genetic variants. Widely adopted community association panels have been employed by plant genetics communities to study the genetic basis of naturally occurring phenotypic variation in a wide range of traits. High-density genetic marker data-18M markers-from 2 partially overlapping maize association panels comprising 1,014 unique genotypes grown in field trials across at least 7 US states and scored for 162 distinct trait data sets enabled the identification of of 2,154 suggestive marker-trait associations and 697 confident associations in the maize genome using a resampling-based genome-wide association strategy. The precision of individual marker-trait associations was estimated to be 3 genes based on a reference set of genes with known phenotypes. Examples were observed of both genetic loci associated with variation in diverse traits (e.g., above-ground and below-ground traits), as well as individual loci associated with the same or similar traits across diverse environments. Many significant signals are located near genes whose functions were previously entirely unknown or estimated purely via functional data on homologs. This study demonstrates the potential of mining community association panel data using new higher-density genetic marker sets combined with resampling-based genome-wide association tests to develop testable hypotheses about gene functions, identify potential pleiotropic effects of natural genetic variants, and study genotype-by-environment interaction.
Collapse
Affiliation(s)
- Ravi V Mural
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Guangchao Sun
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Marcin Grzybowski
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Michael C Tross
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Hongyu Jin
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Christine Smith
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - Linsey Newton
- Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Carson M Andorf
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50010, USA
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | | | - Addie M Thompson
- Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Brandi Sigmon
- Department of Plant Pathology, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| | - James C Schnable
- Center for Plant Science Innovation, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
- Department of Agronomy and Horticulture, University of Nebraska–Lincoln, Lincoln, NE 68588, USA
| |
Collapse
|
13
|
Washburn JD, Cimen E, Ramstein G, Reeves T, O'Briant P, McLean G, Cooper M, Hammer G, Buckler ES. Predicting phenotypes from genetic, environment, management, and historical data using CNNs. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3997-4011. [PMID: 34448888 DOI: 10.1007/s00122-021-03943-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 08/18/2021] [Indexed: 06/13/2023]
Abstract
Convolutional Neural Networks (CNNs) can perform similarly or better than standard genomic prediction methods when sufficient genetic, environmental, and management data are provided. Predicting phenotypes from genetic (G), environmental (E), and management (M) conditions is a long-standing challenge with implications to agriculture, medicine, and conservation. Most methods reduce the factors in a dataset (feature engineering) in a subjective and potentially oversimplified manner. Deep neural networks such as Multilayer Perceptrons (MPL) and Convolutional Neural Networks (CNN) can overcome this by allowing the data itself to determine which factors are most important. CNN models were developed for predicting agronomic yield from a combination of replicated trials and historical yield survey data. The results were more accurate than standard methods when tested on held-out G, E, and M data (r = 0.50 vs. r = 0.43), and performed slightly worse than standard methods when only G was held out (r = 0.74 vs. r = 0.80). Pre-training on historical data increased accuracy compared to trial data alone. Saliency map analysis indicated the CNN has "learned" to prioritize many factors of known agricultural importance.
Collapse
Affiliation(s)
- Jacob D Washburn
- United States Department of Agriculture, Agricultural Research Service, Columbia, MO, 65211, USA.
| | - Emre Cimen
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Computational Intelligence and Optimization Laboratory, Industrial Engineering Department, Eskisehir Technical University, Eskisehir, Turkey
| | - Guillaume Ramstein
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000, Aarhus, Denmark
| | - Timothy Reeves
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Patrick O'Briant
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
| | - Greg McLean
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Mark Cooper
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Graeme Hammer
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, 14853, USA
- Department of Agriculture, Agricultural Research Service, Ithaca, NY, 14850, USA
| |
Collapse
|
14
|
Westhues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter JC, Simianer H, Beissinger TM. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. FRONTIERS IN PLANT SCIENCE 2021; 12:699589. [PMID: 34880880 PMCID: PMC8647909 DOI: 10.3389/fpls.2021.699589] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 10/15/2021] [Indexed: 05/26/2023]
Abstract
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
Collapse
Affiliation(s)
- Cathy C. Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| | | | - Sofia da Silva
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Malthe Schmidt
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| | - Timothy M. Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| |
Collapse
|
15
|
Danilevicz MF, Bayer PE, Nestor BJ, Bennamoun M, Edwards D. Resources for image-based high-throughput phenotyping in crops and data sharing challenges. PLANT PHYSIOLOGY 2021; 187:699-715. [PMID: 34608963 PMCID: PMC8561249 DOI: 10.1093/plphys/kiab301] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 05/26/2021] [Indexed: 05/06/2023]
Abstract
High-throughput phenotyping (HTP) platforms are capable of monitoring the phenotypic variation of plants through multiple types of sensors, such as red green and blue (RGB) cameras, hyperspectral sensors, and computed tomography, which can be associated with environmental and genotypic data. Because of the wide range of information provided, HTP datasets represent a valuable asset to characterize crop phenotypes. As HTP becomes widely employed with more tools and data being released, it is important that researchers are aware of these resources and how they can be applied to accelerate crop improvement. Researchers may exploit these datasets either for phenotype comparison or employ them as a benchmark to assess tool performance and to support the development of tools that are better at generalizing between different crops and environments. In this review, we describe the use of image-based HTP for yield prediction, root phenotyping, development of climate-resilient crops, detecting pathogen and pest infestation, and quantitative trait measurement. We emphasize the need for researchers to share phenotypic data, and offer a comprehensive list of available datasets to assist crop breeders and tool developers to leverage these resources in order to accelerate crop breeding.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Benjamin J. Nestor
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
| | - Mohammed Bennamoun
- Department of Computer Science and Software Engineering, University of Western Australia, Perth, Western Australia 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia 6009, Australia
- Author for communication:
| |
Collapse
|
16
|
Maize Yield Prediction at an Early Developmental Stage Using Multispectral Images and Genotype Data for Preliminary Hybrid Selection. REMOTE SENSING 2021. [DOI: 10.3390/rs13193976] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Assessing crop production in the field often requires breeders to wait until the end of the season to collect yield-related measurements, limiting the pace of the breeding cycle. Early prediction of crop performance can reduce this constraint by allowing breeders more time to focus on the highest-performing varieties. Here, we present a multimodal deep learning model for predicting the performance of maize (Zea mays) at an early developmental stage, offering the potential to accelerate crop breeding. We employed multispectral images and eight vegetation indices, collected by an uncrewed aerial vehicle approximately 60 days after sowing, over three consecutive growing cycles (2017, 2018 and 2019). The multimodal deep learning approach was used to integrate field management and genotype information with the multispectral data, providing context to the conditions that the plants experienced during the trial. Model performance was assessed using holdout data, in which the model accurately predicted the yield (RMSE 1.07 t/ha, a relative RMSE of 7.60% of 16 t/ha, and R2 score 0.73) and identified the majority of high-yielding varieties, outperforming previously published models for early yield prediction. The inclusion of vegetation indices was important for model performance, with a normalized difference vegetation index and green with normalized difference vegetation index contributing the most to model performance. The model provides a decision support tool, identifying promising lines early in the field trial.
Collapse
|
17
|
Swentowsky KW, Bell HS, Wills DM, Dawe RK. QTL Map of Early- and Late-Stage Perennial Regrowth in Zea diploperennis. FRONTIERS IN PLANT SCIENCE 2021; 12:707839. [PMID: 34504508 PMCID: PMC8421791 DOI: 10.3389/fpls.2021.707839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/30/2021] [Indexed: 06/13/2023]
Abstract
Numerous climate change threats will necessitate a shift toward more sustainable agricultural practices during the 21st century. Conversion of annual crops to perennials that are capable of regrowing over multiple yearly growth cycles could help to facilitate this transition. Perennials can capture greater amounts of carbon and access more water and soil nutrients compared to annuals. In principle it should be possible to identify genes that confer perenniality from wild relatives and transfer them into existing breeding lines to create novel perennial crops. Two major loci controlling perennial regrowth in the maize relative Zea diploperennis were previously mapped to chromosome 2 (reg1) and chromosome 7 (reg2). Here we extend this work by mapping perennial regrowth in segregating populations involving Z. diploperennis and the maize inbreds P39 and Hp301 using QTL-seq and traditional QTL mapping approaches. The results confirmed the existence of a major perennial regrowth QTL on chromosome 2 (reg1). Although we did not observe the reg2 QTL in these populations, we discovered a third QTL on chromosome 8 which we named regrowth3 (reg3). The reg3 locus exerts its strongest effect late in the regrowth cycle. Neither reg1 nor reg3 overlapped with tiller number QTL scored in the same population, suggesting specific roles in the perennial phenotype. Our data, along with prior work, indicate that perennial regrowth in maize is conferred by relatively few major QTL.
Collapse
Affiliation(s)
- Kyle W. Swentowsky
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| | - Harrison S. Bell
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| | - David M. Wills
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| | - R. Kelly Dawe
- Department of Plant Biology, University of Georgia, Athens, GA, United States
- Department of Genetics, University of Georgia, Athens, GA, United States
| |
Collapse
|
18
|
Runcie DE, Qu J, Cheng H, Crawford L. MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits. Genome Biol 2021; 22:213. [PMID: 34301310 PMCID: PMC8299638 DOI: 10.1186/s13059-021-02416-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 06/23/2021] [Indexed: 12/21/2022] Open
Abstract
Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using three examples with real plant data, we show that MegaLMM can leverage thousands of traits at once to significantly improve genetic value prediction accuracy.
Collapse
Affiliation(s)
- Daniel E. Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA USA
| | - Jiayi Qu
- Department of Plant Sciences, University of California Davis, Davis, CA USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA USA
| | | |
Collapse
|
19
|
Rogers AR, Dunne JC, Romay C, Bohn M, Buckler ES, Ciampitti IA, Edwards J, Ertl D, Flint-Garcia S, Gore MA, Graham C, Hirsch CN, Hood E, Hooker DC, Knoll J, Lee EC, Lorenz A, Lynch JP, McKay J, Moose SP, Murray SC, Nelson R, Rocheford T, Schnable JC, Schnable PS, Sekhon R, Singh M, Smith M, Springer N, Thelen K, Thomison P, Thompson A, Tuinstra M, Wallace J, Wisser RJ, Xu W, Gilmour AR, Kaeppler SM, De Leon N, Holland JB. The importance of dominance and genotype-by-environment interactions on grain yield variation in a large-scale public cooperative maize experiment. G3-GENES GENOMES GENETICS 2021; 11:6062399. [PMID: 33585867 DOI: 10.1093/g3journal/jkaa050] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 11/07/2020] [Indexed: 11/12/2022]
Abstract
High-dimensional and high-throughput genomic, field performance, and environmental data are becoming increasingly available to crop breeding programs, and their integration can facilitate genomic prediction within and across environments and provide insights into the genetic architecture of complex traits and the nature of genotype-by-environment interactions. To partition trait variation into additive and dominance (main effect) genetic and corresponding genetic-by-environment variances, and to identify specific environmental factors that influence genotype-by-environment interactions, we curated and analyzed genotypic and phenotypic data on 1918 maize (Zea mays L.) hybrids and environmental data from 65 testing environments. For grain yield, dominance variance was similar in magnitude to additive variance, and genetic-by-environment variances were more important than genetic main effect variances. Models involving both additive and dominance relationships best fit the data and modeling unique genetic covariances among all environments provided the best characterization of the genotype-by-environment interaction patterns. Similarity of relative hybrid performance among environments was modeled as a function of underlying weather variables, permitting identification of weather covariates driving correlations of genetic effects across environments. The resulting models can be used for genomic prediction of mean hybrid performance across populations of environments tested or for environment-specific predictions. These results can also guide efforts to incorporate high-throughput environmental data into genomic prediction models and predict values in new environments characterized with the same environmental characteristics.
Collapse
Affiliation(s)
- Anna R Rogers
- Program in Genetics, North Carolina State University, Raleigh, NC 27695, USA
| | - Jeffrey C Dunne
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Martin Bohn
- Department of Crop Sciences, University of Illinois at Urban-Champaign, Urbana, IL 61801, USA
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA.,USDA-ARS Plant, Soil, and Nutrition Research Unit, Cornell University, Ithaca, NY 14853, USA
| | | | - Jode Edwards
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA.,USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA 50131, USA
| | - Sherry Flint-Garcia
- USDA-ARS Plant Genetics Research Unit, University of Missouri, Columbia, MO 65211, USA
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Christopher Graham
- Plant Science Department, West River Agricultural Center, South Dakota State University, Rapid City, SD 57769, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Elizabeth Hood
- College of Agriculture, Arkansas State University, Jonesboro, AR 72467, USA
| | - David C Hooker
- Department of Plant Agriculture, Ridgetown Campus, University of Guelph, Ridgetown, ON N0P 2C0, Canada
| | - Joseph Knoll
- USDA-ARS Crop Genetics and Breeding Research Unit, Tifton, GA 31793, USA
| | - Elizabeth C Lee
- Department of Plant Agriculture, University of Guelph, Guelph N1G 2W1, Canada
| | - Aaron Lorenz
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Jonathan P Lynch
- Department of Plant Science, Penn State University, University Park, PA 16802, USA
| | - John McKay
- Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523, USA
| | - Stephen P Moose
- Department of Crop Sciences, University of Illinois at Urban-Champaign, Urbana, IL 61801, USA
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Rebecca Nelson
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Torbert Rocheford
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE 68583, USA
| | - Patrick S Schnable
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA.,Plant Sciences Institute, Iowa State University, Ames, IA 50011, USA
| | - Rajandeep Sekhon
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
| | - Maninder Singh
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Margaret Smith
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Nathan Springer
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE 68583, USA
| | - Kurt Thelen
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN 55108, USA
| | - Peter Thomison
- Department of Horticulture and Crop Science, The Ohio State University, Columbus, OH 43210, USA
| | - Addie Thompson
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN 55108, USA
| | - Mitch Tuinstra
- Department of Agronomy, Purdue University, West Lafayette, IN 47907, USA
| | - Jason Wallace
- Department of Crop and Soil Sciences, University of Georgia, Athens GA 30602, USA
| | - Randall J Wisser
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE 19716, USA
| | - Wenwei Xu
- Texas A& M AgriLife Research, Texas A& M University, Lubbock, TX 79403, USA
| | | | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin, Madison, WI 53706, USA
| | - Natalia De Leon
- Department of Agronomy, University of Wisconsin, Madison, WI 53706, USA
| | - James B Holland
- Program in Genetics, North Carolina State University, Raleigh, NC 27695, USA.,Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC 27695, USA.,USDA-ARS Plant Science Research Unit, North Carolina State University, Raleigh, NC 27695-7620, USA
| |
Collapse
|
20
|
Adak A, Murray SC, Anderson SL, Popescu SC, Malambo L, Romay MC, de Leon N. Unoccupied aerial systems discovered overlooked loci capturing the variation of entire growing period in maize. THE PLANT GENOME 2021; 14:e20102. [PMID: 34009740 DOI: 10.1002/tpg2.20102] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/29/2021] [Indexed: 06/12/2023]
Abstract
Traditional phenotyping methods, coupled with genetic mapping in segregating populations, have identified loci governing complex traits in many crops. Unoccupied aerial systems (UAS)-based phenotyping has helped to reveal a more novel and dynamic relationship between time-specific associated loci with complex traits previously unable to be evaluated. Over 1,500 maize (Zea mays L.) hybrid row plots containing 280 different replicated maize hybrids from the Genomes to Fields (G2F) project were evaluated agronomically and using UAS in 2017. Weekly UAS flights captured variation in plant heights during the growing season under three different management conditions each year: optimal planting with irrigation (G2FI), optimal dryland planting without irrigation (G2FD), and a stressed late planting (G2LA). Plant height of different flights were ranked based on importance for yield using a random forest (RF) algorithm. Plant heights captured by early flights in G2FI trials had higher importance (based on Gini scores) for predicting maize grain yield (GY) but also higher accuracies in genomic predictions which fluctuated for G2FD (-0.06∼0.73), G2FI (0.33∼0.76), and G2LA (0.26∼0.78) trials. A genome-wide association analysis discovered 52 significant single nucleotide polymorphisms (SNPs), seven were found consistently in more than one flights or trial; 45 were flight or trial specific. Total cumulative marker effects for each chromosome's contributions to plant height also changed depending on flight. Using UAS phenotyping, this study showed that many candidate genes putatively play a role in the regulation of plant architecture even in relatively early stages of maize growth and development.
Collapse
Affiliation(s)
- Alper Adak
- Dept. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX, 77843-2474, USA
| | - Seth C Murray
- Dept. of Soil and Crop Sciences, Texas A&M Univ., College Station, TX, 77843-2474, USA
| | - Steven L Anderson
- Dept. of Environmental Hort., Institute of Food and Agricultural Sciences, Mid-Florida Research and Education Center, University of Florida, Apopka, FL, USA
| | - Sorin C Popescu
- Dept. of Ecosystem Science and Management, Texas A&M Univ., College Station, TX, 77843-2120, USA
| | - Lonesome Malambo
- Dept. of Ecosystem Science and Management, Texas A&M Univ., College Station, TX, 77843-2120, USA
| | - M Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, 1575 Linden Drive, Madison, WI, 53706, USA
| |
Collapse
|
21
|
Fritsche-Neto R, Galli G, Borges KLR, Costa-Neto G, Alves FC, Sabadin F, Lyra DH, Morais PPP, Braatz de Andrade LR, Granato I, Crossa J. Optimizing Genomic-Enabled Prediction in Small-Scale Maize Hybrid Breeding Programs: A Roadmap Review. FRONTIERS IN PLANT SCIENCE 2021; 12:658267. [PMID: 34276721 PMCID: PMC8281958 DOI: 10.3389/fpls.2021.658267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 05/10/2021] [Indexed: 06/13/2023]
Abstract
The usefulness of genomic prediction (GP) for many animal and plant breeding programs has been highlighted for many studies in the last 20 years. In maize breeding programs, mostly dedicated to delivering more highly adapted and productive hybrids, this approach has been proved successful for both large- and small-scale breeding programs worldwide. Here, we present some of the strategies developed to improve the accuracy of GP in tropical maize, focusing on its use under low budget and small-scale conditions achieved for most of the hybrid breeding programs in developing countries. We highlight the most important outcomes obtained by the University of São Paulo (USP, Brazil) and how they can improve the accuracy of prediction in tropical maize hybrids. Our roadmap starts with the efforts for germplasm characterization, moving on to the practices for mating design, and the selection of the genotypes that are used to compose the training population in field phenotyping trials. Factors including population structure and the importance of non-additive effects (dominance and epistasis) controlling the desired trait are also outlined. Finally, we explain how the source of the molecular markers, environmental, and the modeling of genotype-environment interaction can affect the accuracy of GP. Results of 7 years of research in a public maize hybrid breeding program under tropical conditions are discussed, and with the great advances that have been made, we find that what is yet to come is exciting. The use of open-source software for the quality control of molecular markers, implementing GP, and envirotyping pipelines may reduce costs in an efficient computational manner. We conclude that exploring new models/tools using high-throughput phenotyping data along with large-scale envirotyping may bring more resolution and realism when predicting genotype performances. Despite the initial costs, mostly for genotyping, the GP platforms in combination with these other data sources can be a cost-effective approach for predicting the performance of maize hybrids for a large set of growing conditions.
Collapse
Affiliation(s)
- Roberto Fritsche-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Giovanni Galli
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Karina Lima Reis Borges
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Germano Costa-Neto
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, United States
| | - Felipe Sabadin
- Laboratory of Allogamous Plant Breeding, Genetics Department, Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Danilo Hottis Lyra
- Department of Computational and Analytical Sciences, Rothamsted Research, Harpenden, United Kingdom
| | | | | | - Italo Granato
- Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux (LEPSE), Institut National de la Recherche Agronomique (INRA), Univ. Montpellier, SupAgro, Montpellier, France
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Carretera México - Veracruz, Texcoco, Mexico
- Colegio de Posgraduado, Montecillo, Mexico
| |
Collapse
|
22
|
Jarquin D, de Leon N, Romay C, Bohn M, Buckler ES, Ciampitti I, Edwards J, Ertl D, Flint-Garcia S, Gore MA, Graham C, Hirsch CN, Holland JB, Hooker D, Kaeppler SM, Knoll J, Lee EC, Lawrence-Dill CJ, Lynch JP, Moose SP, Murray SC, Nelson R, Rocheford T, Schnable JC, Schnable PS, Smith M, Springer N, Thomison P, Tuinstra M, Wisser RJ, Xu W, Yu J, Lorenz A. Utility of Climatic Information via Combining Ability Models to Improve Genomic Prediction for Yield Within the Genomes to Fields Maize Project. Front Genet 2021; 11:592769. [PMID: 33763106 PMCID: PMC7982677 DOI: 10.3389/fgene.2020.592769] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 12/21/2020] [Indexed: 11/29/2022] Open
Abstract
Genomic prediction provides an efficient alternative to conventional phenotypic selection for developing improved cultivars with desirable characteristics. New and improved methods to genomic prediction are continually being developed that attempt to deal with the integration of data types beyond genomic information. Modern automated weather systems offer the opportunity to capture continuous data on a range of environmental parameters at specific field locations. In principle, this information could characterize training and target environments and enhance predictive ability by incorporating weather characteristics as part of the genotype-by-environment (G×E) interaction component in prediction models. We assessed the usefulness of including weather data variables in genomic prediction models using a naïve environmental kinship model across 30 environments comprising the Genomes to Fields (G2F) initiative in 2014 and 2015. Specifically four different prediction scenarios were evaluated (i) tested genotypes in observed environments; (ii) untested genotypes in observed environments; (iii) tested genotypes in unobserved environments; and (iv) untested genotypes in unobserved environments. A set of 1,481 unique hybrids were evaluated for grain yield. Evaluations were conducted using five different models including main effect of environments; general combining ability (GCA) effects of the maternal and paternal parents modeled using the genomic relationship matrix; specific combining ability (SCA) effects between maternal and paternal parents; interactions between genetic (GCA and SCA) effects and environmental effects; and finally interactions between the genetics effects and environmental covariates. Incorporation of the genotype-by-environment interaction term improved predictive ability across all scenarios. However, predictive ability was not improved through inclusion of naive environmental covariates in G×E models. More research should be conducted to link the observed weather conditions with important physiological aspects in plant development to improve predictive ability through the inclusion of weather data.
Collapse
Affiliation(s)
- Diego Jarquin
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, United States
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, Madison, WI, United States
| | - Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, United States
| | - Martin Bohn
- Department of Crop Sciences, University of Illinois at Urban-Champaign, Urbana, IL, United States
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, United States.,U.S. Department of Agriculture - Agricultural Research Service Plant, Soil, and Nutrition Research Unit, Cornell University, Ithaca, NY, United States
| | - Ignacio Ciampitti
- Department of Agronomy, Kansas State University, Manhattan, KS, United States
| | - Jode Edwards
- Department of Agronomy, Iowa State University, Ames, IA, United States.,U.S. Department of Agriculture - Agricultural Research Service Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA, United States
| | - David Ertl
- Iowa Corn Promotion Board, Johnston, IA, United States
| | - Sherry Flint-Garcia
- U.S. Department of Agriculture - Agricultural Research Service Plant Genetics Research Unit, University of Missouri, Columbia, MO, United States
| | - Michael A Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Christopher Graham
- Plant Science Department, West River Agricultural Center, South Dakota State University, Rapid City, SD, United States
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, United States
| | - James B Holland
- U.S. Department of Agriculture - Agricultural Research Service Plant Science Research Unit, North Carolina State University, Raleigh, NC, United States
| | - David Hooker
- Department of Plant Agriculture, Ridgetown Campus, University of Guelph, Ridgetown, ON, Canada
| | - Shawn M Kaeppler
- Department of Agronomy, University of Wisconsin, Madison, WI, United States
| | - Joseph Knoll
- U.S. Department of Agriculture - Agricultural Research Service Crop Genetics and Breeding Research Unit, Tifton, GA, United States
| | - Elizabeth C Lee
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Carolyn J Lawrence-Dill
- Department of Agronomy, Iowa State University, Ames, IA, United States.,Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States.,Plant Sciences Institute, Iowa State University, Ames, IA, United States
| | - Jonathan P Lynch
- Department of Plant Science, Penn State University, University Park, PA, United States
| | - Stephen P Moose
- Department of Crop Sciences, University of Illinois at Urban-Champaign, Urbana, IL, United States
| | - Seth C Murray
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, United States
| | - Rebecca Nelson
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Torbert Rocheford
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, United States
| | - Patrick S Schnable
- U.S. Department of Agriculture - Agricultural Research Service Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA, United States.,Plant Sciences Institute, Iowa State University, Ames, IA, United States
| | - Margaret Smith
- U.S. Department of Agriculture - Agricultural Research Service Plant, Soil, and Nutrition Research Unit, Cornell University, Ithaca, NY, United States
| | - Nathan Springer
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, United States
| | - Peter Thomison
- Department of Horticulture and Crop Science, The Ohio State University, Columbus, OH, United States
| | - Mitch Tuinstra
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Randall J Wisser
- Department of Plant and Soil Sciences, University of Delaware, Newark, DE, United States
| | - Wenwei Xu
- Texas A&M AgriLife Research, Texas A&M University, Lubbock, TX, United States
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Aaron Lorenz
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, United States
| |
Collapse
|
23
|
Nuccio ML, Claeys H, Heyndrickx KS. CRISPR-Cas technology in corn: a new key to unlock genetic knowledge and create novel products. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2021; 41:11. [PMID: 37309473 PMCID: PMC10236071 DOI: 10.1007/s11032-021-01200-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 01/04/2021] [Indexed: 06/14/2023]
Abstract
Since its inception in 2012, CRISPR-Cas technologies have taken the life science community by storm. Maize genetics research is no exception. Investigators around the world have adapted CRISPR tools to advance maize genetics research in many ways. The principle application has been targeted mutagenesis to confirm candidate genes identified using map-based methods. Researchers are also developing tools to more effectively apply CRISPR-Cas technologies to maize because successful application of CRISPR-Cas relies on target gene identification, guide RNA development, vector design and construction, CRISPR-Cas reagent delivery to maize tissues, and plant characterization, each contributing unique challenges to CRISPR-Cas efficacy. Recent advances continue to chip away at major barriers that prevent more widespread use of CRISPR-Cas technologies in maize, including germplasm-independent delivery of CRISPR-Cas reagents and production of high-resolution genomic data in relevant germplasm to facilitate CRISPR-Cas experimental design. This has led to the development of novel breeding tools to advance maize genetics and demonstrations of how CRISPR-Cas technologies might be used to enhance maize germplasm. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-021-01200-9.
Collapse
|
24
|
Morales N, Bauchet GJ, Tantikanjana T, Powell AF, Ellerbrock BJ, Tecle IY, Mueller LA. High density genotype storage for plant breeding in the Chado schema of Breedbase. PLoS One 2020; 15:e0240059. [PMID: 33175872 PMCID: PMC7657515 DOI: 10.1371/journal.pone.0240059] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 09/17/2020] [Indexed: 12/24/2022] Open
Abstract
Modern breeding programs routinely use genome-wide information for selecting individuals to advance. The large volumes of genotypic information required present a challenge for data storage and query efficiency. Major use cases require genotyping data to be linked with trait phenotyping data. In contrast to phenotyping data that are often stored in relational database schemas, next-generation genotyping data are traditionally stored in non-relational storage systems due to their extremely large scope. This study presents a novel data model implemented in Breedbase (https://breedbase.org/) for uniting relational phenotyping data and non-relational genotyping data within the open-source PostgreSQL database engine. Breedbase is an open-source, web-database designed to manage all of a breeder's informatics needs: management of field experiments, phenotypic and genotypic data collection and storage, and statistical analyses. The genotyping data is stored in a PostgreSQL data-type known as binary JavaScript Object Notation (JSONb), where the JSON structures closely follow the Variant Call Format (VCF) data model. The Breedbase genotyping data model can handle different ploidy levels, structural variants, and any genotype encoded in VCF. JSONb is both compressed and indexed, resulting in a space and time efficient system. Furthermore, file caching maximizes data retrieval performance. Integration of all breeding data within the Chado database schema retains referential integrity that may be lost when genotyping and phenotyping data are stored in separate systems. Benchmarking demonstrates that the system is fast enough for computation of a genomic relationship matrix (GRM) and genome wide association study (GWAS) for datasets involving 1,325 diploid Zea mays, 314 triploid Musa acuminata, and 924 diploid Manihot esculenta samples genotyped with 955,690, 142,119, and 287,952 genotype-by-sequencing (GBS) markers, respectively.
Collapse
Affiliation(s)
- Nicolas Morales
- Plant Breeding and Genetics, Cornell University, Ithaca, NY, United States of America
- Boyce Thompson Institute, Ithaca, NY, United States of America
| | | | | | | | | | - Isaak Y. Tecle
- Boyce Thompson Institute, Ithaca, NY, United States of America
| | | |
Collapse
|
25
|
Anche MT, Kaczmar NS, Morales N, Clohessy JW, Ilut DC, Gore MA, Robbins KR. Temporal covariance structure of multi-spectral phenotypes and their predictive ability for end-of-season traits in maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2020; 133:2853-2868. [PMID: 32613265 PMCID: PMC7497340 DOI: 10.1007/s00122-020-03637-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/16/2020] [Indexed: 05/12/2023]
Abstract
Heritable variation in phenotypes extracted from multi-spectral images (MSIs) and strong genetic correlations with end-of-season traits indicates the value of MSIs for crop improvement and modeling of plant growth curve. Vegetation indices (VIs) derived from multi-spectral imaging (MSI) platforms can be used to study properties of crop canopy, providing non-destructive phenotypes that could be used to better understand growth curves throughout the growing season. To investigate the amount of variation present in several VIs and their relationship with important end-of-season traits, genetic and residual (co)variances for VIs, grain yield and moisture were estimated using data collected from maize hybrid trials. The VIs considered were Normalized Difference Vegetation Index (NDVI), Green NDVI, Red Edge NDVI, Soil-Adjusted Vegetation Index, Enhanced Vegetation Index and simple Ratio of Near Infrared to Red (Red) reflectance. Genetic correlations of VIs with grain yield and moisture were used to fit multi-trait models for prediction of end-of-season traits and evaluated using within site/year cross-validation. To explore alternatives to fitting multiple phenotypes from MSI, random regression models with linear splines were fit using data collected in 2016 and 2017. Heritability estimates ranging from (0.10 to 0.82) were observed, indicating that there exists considerable amount of genetic variation in these VIs. Furthermore, strong genetic and residual correlations of the VIs, NDVI and NDRE, with grain yield and moisture were found. Considerable increases in prediction accuracy were observed from the multi-trait model when using NDVI and NDRE as a secondary trait. Finally, random regression with a linear spline function shows potential to be used as an alternative to mixed models to fit VIs from multiple time points.
Collapse
Affiliation(s)
- Mahlet T. Anche
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| | - Nicholas S. Kaczmar
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
- Present Address: Horticulture Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| | - Nicolas Morales
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| | - James W. Clohessy
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
- Present Address: North Florida Research and Education Center, Plant Pathology Department, University of Florida, Quincy, FL 32351 USA
| | - Daniel C. Ilut
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| | - Michael A. Gore
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| | - Kelly R. Robbins
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|