1
|
Mazaheri-Tirani M, Dayani S, Mobarakeh MI. Application of machine learning algorithms for predicting the life-long physiological effects of zinc oxide Micro/Nano particles on Carum copticum. BMC PLANT BIOLOGY 2024; 24:970. [PMID: 39415085 DOI: 10.1186/s12870-024-05662-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 10/03/2024] [Indexed: 10/18/2024]
Abstract
Nanoparticles impose multidimensional effects on living cells that significantly vary among different studies. Machine learning (ML) methods are recommended to elucidate more consistence and predictable relations among the affected parameters. In this study, nine ML algorithms [Support-Vector Regression (SVR), Linear, Bagging, Stochastic Gradient Descent (SGD), Gaussian Process, Random Sample Consensus (RANSAC), Partial Least Squares (PLS), Kernel Ridge, and Random Forest] were applied to evaluate their efficiency in predicting the effects of zinc oxide nanoparticles (ZnO NPs: 0.5, 1, 5, 25, and 125 µM) and microparticles (ZnO MPs: 1, 5, 25, and 125 µM) on Carum copticum. The plant root/shoot biomass; number of leaves, branches, umbellates, and flowers; protein content; reducing sugars; phenolic compounds; chlorophylls (a, b, Total); carotenoids; anthocyanins; H2O2; proline; malondialdehyde (MDA); tissue zinc content; superoxide dismutase (SOD) activity; and media ΔpH were measured and considered input variables. All levels of ZnO MPs treatments increased growth parameters compared to the control (ZnSO4). The highest shoot/root fresh and dry mass were recorded at 5 µM ZnO MPs compared with the control. The root fresh/dry mass under ZnO NPs treatments was more sensitive than shoot parameters. The number of flowers increased by 134 and 79% in MPs and NPs treatments compared to the control, respectively. ZnO NPs reduced protein content by up to 81% in 125 µM NPs compared to ZnSO4. Reducing sugar content increased to 25, 40 and 36% in 5, 25, 125 µM MPs and 67, 68, 26, 26 and 21% in 0.5, 1, 5, 25 and 125 µM NPs treatments, respectively. The pH alteration was more significant under NPs and affected zinc uptake. All levels of ZnO NPs treatments increased growth parameters compared to the control. All ML algorithms showed varied efficiencies in predicting the nonlinear relationships among parameters, with higher efficiency in predicting the behavior of root and shoot dry mass, root fresh weight and number of flowers according to R2 index. The model obtained from SVR with the radial basis function (RBF) kernel was selected as a comprehensive model for predicting and determining the efficacy of the results.
Collapse
Affiliation(s)
- Maryam Mazaheri-Tirani
- Department of Biology, Faculty of Science, University of Jiroft, Jiroft, 78671-61167, Iran.
| | - Soleyman Dayani
- Department of Biotechnology, Faculty of Agriculture and Natural Resources, Imam Khomeini International University (IKIU), Qazvin, 34149-16818, Iran
| | | |
Collapse
|
2
|
Ghavi Hossein-Zadeh N. An overview of recent technological developments in bovine genomics. Vet Anim Sci 2024; 25:100382. [PMID: 39166173 PMCID: PMC11334705 DOI: 10.1016/j.vas.2024.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024] Open
Abstract
Cattle are regarded as highly valuable animals because of their milk, beef, dung, fur, and ability to draft. The scientific community has tried a number of strategies to improve the genetic makeup of bovine germplasm. To ensure higher returns for the dairy and beef industries, researchers face their greatest challenge in improving commercially important traits. One of the biggest developments in the last few decades in the creation of instruments for cattle genetic improvement is the discovery of the genome. Breeding livestock is being revolutionized by genomic selection made possible by the availability of medium- and high-density single nucleotide polymorphism (SNP) arrays coupled with sophisticated statistical techniques. It is becoming easier to access high-dimensional genomic data in cattle. Continuously declining genotyping costs and an increase in services that use genomic data to increase return on investment have both made a significant contribution to this. The field of genomics has come a long way thanks to groundbreaking discoveries such as radiation-hybrid mapping, in situ hybridization, synteny analysis, somatic cell genetics, cytogenetic maps, molecular markers, association studies for quantitative trait loci, high-throughput SNP genotyping, whole-genome shotgun sequencing to whole-genome mapping, and genome editing. These advancements have had a significant positive impact on the field of cattle genomics. This manuscript aimed to review recent advances in genomic technologies for cattle breeding and future prospects in this field.
Collapse
Affiliation(s)
- Navid Ghavi Hossein-Zadeh
- Department of Animal Science, Faculty of Agricultural Sciences, University of Guilan, Rasht, 41635-1314, Iran
| |
Collapse
|
3
|
Derbyshire MC, Newman TE, Thomas WJW, Batley J, Edwards D. The complex relationship between disease resistance and yield in crops. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:2612-2623. [PMID: 38743906 PMCID: PMC11331782 DOI: 10.1111/pbi.14373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 04/03/2024] [Accepted: 04/28/2024] [Indexed: 05/16/2024]
Abstract
In plants, growth and defence are controlled by many molecular pathways that are antagonistic to one another. This results in a 'growth-defence trade-off', where plants temporarily reduce growth in response to pests or diseases. Due to this antagonism, genetic variants that improve resistance often reduce growth and vice versa. Therefore, in natural populations, the most disease resistant individuals are often the slowest growing. In crops, slow growth may translate into a yield penalty, but resistance is essential for protecting yield in the presence of disease. Therefore, plant breeders must balance these traits to ensure optimal yield potential and yield stability. In crops, both qualitative and quantitative disease resistance are often linked with genetic variants that cause yield penalties, but this is not always the case. Furthermore, both crop yield and disease resistance are complex traits influenced by many aspects of the plant's physiology, morphology and environment, and the relationship between the molecular growth-defence trade-off and disease resistance-yield antagonism is not well-understood. In this article, we highlight research from the last 2 years on the molecular mechanistic basis of the antagonism between defence and growth. We then discuss the interaction between disease resistance and crop yield from a breeding perspective, outlining the complexity and nuances of this relationship and where research can aid practical methods for simultaneous improvement of yield potential and disease resistance.
Collapse
Affiliation(s)
- Mark C. Derbyshire
- Centre for Crop and Disease ManagementCurtin UniversityPerthWestern AustraliaAustralia
| | - Toby E. Newman
- Centre for Crop and Disease ManagementCurtin UniversityPerthWestern AustraliaAustralia
| | - William J. W. Thomas
- Centre for Applied Bioinformatics and School of Biological ScienceUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - Jacqueline Batley
- Centre for Applied Bioinformatics and School of Biological ScienceUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - David Edwards
- Centre for Applied Bioinformatics and School of Biological ScienceUniversity of Western AustraliaPerthWestern AustraliaAustralia
| |
Collapse
|
4
|
Inamori M, Kimura T, Mori M, Tarumoto Y, Hattori T, Hayano M, Umeda M, Iwata H. Machine learning for genomic and pedigree prediction in sugarcane. THE PLANT GENOME 2024:e20486. [PMID: 38923818 DOI: 10.1002/tpg2.20486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 06/28/2024]
Abstract
Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non-additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non-additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single-nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross-validation methods: repeated 10-fold cross-validation and leave-one-family-out cross-validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10-fold cross-validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave-one-family-out cross-validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.
Collapse
Affiliation(s)
- Minoru Inamori
- Laboratory of Biometry and Bioinformatics, Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Tatsuro Kimura
- Toyota Motor Corporation, New Business Planning Division, Agriculture & Biotechnology Business Department, Toyota, Japan
| | - Masaaki Mori
- Toyota Motor Corporation, Environment Affairs and Engineering Management Division, CN Advanced Engineering Development Center, Tokyo, Japan
| | - Yusuke Tarumoto
- NARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site, Nishinoomote, Japan
| | - Taiichiro Hattori
- NARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site, Nishinoomote, Japan
- NARO Kyushu Okinawa Agricultural Research Center, Itoman Resident Office, Itoman, Japan
| | - Michiko Hayano
- NARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site, Nishinoomote, Japan
- NARO Institute for Agro-Environmental Science, Tsukuba, Japan
| | - Makoto Umeda
- NARO Kyushu Okinawa Agricultural Research Center, Tanegashima Sugarcane Breeding Site, Nishinoomote, Japan
| | - Hiroyoshi Iwata
- Laboratory of Biometry and Bioinformatics, Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
5
|
Dwivedi SL, Heslop-Harrison P, Amas J, Ortiz R, Edwards D. Epistasis and pleiotropy-induced variation for plant breeding. PLANT BIOTECHNOLOGY JOURNAL 2024. [PMID: 38875130 DOI: 10.1111/pbi.14405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/07/2024] [Accepted: 05/24/2024] [Indexed: 06/16/2024]
Abstract
Epistasis refers to nonallelic interaction between genes that cause bias in estimates of genetic parameters for a phenotype with interactions of two or more genes affecting the same trait. Partitioning of epistatic effects allows true estimation of the genetic parameters affecting phenotypes. Multigenic variation plays a central role in the evolution of complex characteristics, among which pleiotropy, where a single gene affects several phenotypic characters, has a large influence. While pleiotropic interactions provide functional specificity, they increase the challenge of gene discovery and functional analysis. Overcoming pleiotropy-based phenotypic trade-offs offers potential for assisting breeding for complex traits. Modelling higher order nonallelic epistatic interaction, pleiotropy and non-pleiotropy-induced variation, and genotype × environment interaction in genomic selection may provide new paths to increase the productivity and stress tolerance for next generation of crop cultivars. Advances in statistical models, software and algorithm developments, and genomic research have facilitated dissecting the nature and extent of pleiotropy and epistasis. We overview emerging approaches to exploit positive (and avoid negative) epistatic and pleiotropic interactions in a plant breeding context, including developing avenues of artificial intelligence, novel exploitation of large-scale genomics and phenomics data, and involvement of genes with minor effects to analyse epistatic interactions and pleiotropic quantitative trait loci, including missing heritability.
Collapse
Affiliation(s)
| | - Pat Heslop-Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China
- Department of Genetics and Genome Biology, Institute for Environmental Futures, University of Leicester, Leicester, UK
| | - Junrey Amas
- Centre for Applied Bioinformatics, School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | - Rodomiro Ortiz
- Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden
| | - David Edwards
- Centre for Applied Bioinformatics, School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| |
Collapse
|
6
|
Hu H, Li R, Zhao J, Batley J, Edwards D. Technological Development and Advances for Constructing and Analyzing Plant Pangenomes. Genome Biol Evol 2024; 16:evae081. [PMID: 38669452 PMCID: PMC11058698 DOI: 10.1093/gbe/evae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/09/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
A pangenome captures the genomic diversity for a species, derived from a collection of genetic sequences of diverse populations. Advances in sequencing technologies have given rise to three primary methods for pangenome construction and analysis: de novo assembly and comparison, reference genome-based iterative assembly, and graph-based pangenome construction. Each method presents advantages and challenges in processing varying amounts and structures of DNA sequencing data. With the emergence of high-quality genome assemblies and advanced bioinformatic tools, the graph-based pangenome is emerging as an advanced reference for exploring the biological and functional implications of genetic variations.
Collapse
Affiliation(s)
- Haifei Hu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Risheng Li
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
- College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Junliang Zhao
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
7
|
Martins FB, Aono AH, Moraes ADCL, Ferreira RCU, Vilela MDM, Pessoa-Filho M, Rodrigues-Motta M, Simeão RM, de Souza AP. Genome-wide family prediction unveils molecular mechanisms underlying the regulation of agronomic traits in Urochloa ruziziensis. FRONTIERS IN PLANT SCIENCE 2023; 14:1303417. [PMID: 38148869 PMCID: PMC10749977 DOI: 10.3389/fpls.2023.1303417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/15/2023] [Indexed: 12/28/2023]
Abstract
Tropical forage grasses, particularly those belonging to the Urochloa genus, play a crucial role in cattle production and serve as the main food source for animals in tropical and subtropical regions. The majority of these species are apomictic and tetraploid, highlighting the significance of U. ruziziensis, a sexual diploid species that can be tetraploidized for use in interspecific crosses with apomictic species. As a means to support breeding programs, our study investigates the feasibility of genome-wide family prediction in U. ruziziensis families to predict agronomic traits. Fifty half-sibling families were assessed for green matter yield, dry matter yield, regrowth capacity, leaf dry matter, and stem dry matter across different clippings established in contrasting seasons with varying available water capacity. Genotyping was performed using a genotyping-by-sequencing approach based on DNA samples from family pools. In addition to conventional genomic prediction methods, machine learning and feature selection algorithms were employed to reduce the necessary number of markers for prediction and enhance predictive accuracy across phenotypes. To explore the regulation of agronomic traits, our study evaluated the significance of selected markers for prediction using a tree-based approach, potentially linking these regions to quantitative trait loci (QTLs). In a multiomic approach, genes from the species transcriptome were mapped and correlated to those markers. A gene coexpression network was modeled with gene expression estimates from a diverse set of U. ruziziensis genotypes, enabling a comprehensive investigation of molecular mechanisms associated with these regions. The heritabilities of the evaluated traits ranged from 0.44 to 0.92. A total of 28,106 filtered SNPs were used to predict phenotypic measurements, achieving a mean predictive ability of 0.762. By employing feature selection techniques, we could reduce the dimensionality of SNP datasets, revealing potential genotype-phenotype associations. The functional annotation of genes near these markers revealed associations with auxin transport and biosynthesis of lignin, flavonol, and folic acid. Further exploration with the gene coexpression network uncovered associations with DNA metabolism, stress response, and circadian rhythm. These genes and regions represent important targets for expanding our understanding of the metabolic regulation of agronomic traits and offer valuable insights applicable to species breeding. Our work represents an innovative contribution to molecular breeding techniques for tropical forages, presenting a viable marker-assisted breeding approach and identifying target regions for future molecular studies on these agronomic traits.
Collapse
Affiliation(s)
- Felipe Bitencourt Martins
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Alexandre Hild Aono
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | - Aline da Costa Lima Moraes
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| | | | | | - Marco Pessoa-Filho
- Embrapa Cerrados, Brazilian Agricultural Research Corporation, Brasília, Brazil
| | | | - Rosangela Maria Simeão
- Embrapa Gado de Corte, Brazilian Agricultural Research Corporation, Campo Grande, Mato Grosso, Brazil
| | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
- Department of Plant Biology, Biology Institute, University of Campinas (UNICAMP), Campinas, São Paulo, Brazil
| |
Collapse
|
8
|
Chen C, Powell O, Dinglasan E, Ross EM, Yadav S, Wei X, Atkin F, Deomano E, Hayes BJ. Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non-additive variation for key traits. THE PLANT GENOME 2023; 16:e20390. [PMID: 37728221 DOI: 10.1002/tpg2.20390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/01/2023] [Accepted: 08/29/2023] [Indexed: 09/21/2023]
Abstract
Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.
Collapse
Affiliation(s)
- Chensong Chen
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Owen Powell
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Eric Dinglasan
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Elizabeth M Ross
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | - Seema Yadav
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| | | | | | | | - Ben J Hayes
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Queensland, Australia
| |
Collapse
|
9
|
Weber SE, Chawla HS, Ehrig L, Hickey LT, Frisch M, Snowdon RJ. Accurate prediction of quantitative traits with failed SNP calls in canola and maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1221750. [PMID: 37936929 PMCID: PMC10627008 DOI: 10.3389/fpls.2023.1221750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 10/05/2023] [Indexed: 11/09/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls-for example: deletions-and there is increasing evidence that gene presence-absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker-trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | | | - Lennard Ehrig
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Lee T. Hickey
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| |
Collapse
|
10
|
Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14:1150596. [PMID: 37745853 PMCID: PMC10516561 DOI: 10.3389/fgene.2023.1150596] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
The advent of modern genotyping technologies has revolutionized genomic selection in animal breeding. Large marker datasets have shown several drawbacks for traditional genomic prediction methods in terms of flexibility, accuracy, and computational power. Recently, the application of machine learning models in animal breeding has gained a lot of interest due to their tremendous flexibility and their ability to capture patterns in large noisy datasets. Here, we present a general overview of a handful of machine learning algorithms and their application in genomic prediction to provide a meta-picture of their performance in genomic estimated breeding values estimation, genotype imputation, and feature selection. Finally, we discuss a potential adoption of machine learning models in genomic prediction in developing countries. The results of the reviewed studies showed that machine learning models have indeed performed well in fitting large noisy data sets and modeling minor nonadditive effects in some of the studies. However, sometimes conventional methods outperformed machine learning models, which confirms that there's no universal method for genomic prediction. In summary, machine learning models have great potential for extracting patterns from single nucleotide polymorphism datasets. Nonetheless, the level of their adoption in animal breeding is still low due to data limitations, complex genetic interactions, a lack of standardization and reproducibility, and the lack of interpretability of machine learning models when trained with biological data. Consequently, there is no remarkable outperformance of machine learning methods compared to traditional methods in genomic prediction. Therefore, more research should be conducted to discover new insights that could enhance livestock breeding programs.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Isidore Houaga
- Centre for Tropical Livestock Genetics and Health, The Roslin Institute, Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laayoune, Morocco
| |
Collapse
|
11
|
Weber SE, Frisch M, Snowdon RJ, Voss-Fels KP. Haplotype blocks for genomic prediction: a comparative evaluation in multiple crop datasets. FRONTIERS IN PLANT SCIENCE 2023; 14:1217589. [PMID: 37731980 PMCID: PMC10507710 DOI: 10.3389/fpls.2023.1217589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023]
Abstract
In modern plant breeding, genomic selection is becoming the gold standard for selection of superior genotypes. The basis for genomic prediction models is a set of phenotyped lines along with their genotypic profile. With high marker density and linkage disequilibrium (LD) between markers, genotype data in breeding populations tends to exhibit considerable redundancy. Therefore, interest is growing in the use of haplotype blocks to overcome redundancy by summarizing co-inherited features. Moreover, haplotype blocks can help to capture local epistasis caused by interacting loci. Here, we compared genomic prediction methods that either used single SNPs or haplotype blocks with regards to their prediction accuracy for important traits in crop datasets. We used four published datasets from canola, maize, wheat and soybean. Different approaches to construct haplotype blocks were compared, including blocks based on LD, physical distance, number of adjacent markers and the algorithms implemented in the software "Haploview" and "HaploBlocker". The tested prediction methods included Genomic Best Linear Unbiased Prediction (GBLUP), Extended GBLUP to account for additive by additive epistasis (EGBLUP), Bayesian LASSO and Reproducing Kernel Hilbert Space (RKHS) regression. We found improved prediction accuracy in some traits when using haplotype blocks compared to SNP-based predictions, however the magnitude of improvement was very trait- and model-specific. Especially in settings with low marker density, haplotype blocks can improve genomic prediction accuracy. In most cases, physically large haplotype blocks yielded a strong decrease in prediction accuracy. Especially when prediction accuracy varies greatly across different prediction models, prediction based on haplotype blocks can improve prediction accuracy of underperforming models. However, there is no "best" method to build haplotype blocks, since prediction accuracy varied considerably across methods and traits. Hence, criteria used to define haplotype blocks should not be viewed as fixed biological parameters, but rather as hyperparameters that need to be adjusted for every dataset.
Collapse
Affiliation(s)
- Sven E. Weber
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Matthias Frisch
- Department of Biometry and Population Genetics, Justus Liebig University, Giessen, Germany
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Kai P. Voss-Fels
- Institute for Grapevine Breeding, Hochschule Geisenheim University, Geisenheim, Germany
| |
Collapse
|
12
|
Song B, Ning W, Wei D, Jiang M, Zhu K, Wang X, Edwards D, Odeny DA, Cheng S. Plant genome resequencing and population genomics: Current status and future prospects. MOLECULAR PLANT 2023; 16:1252-1268. [PMID: 37501370 DOI: 10.1016/j.molp.2023.07.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 05/30/2023] [Accepted: 07/25/2023] [Indexed: 07/29/2023]
Abstract
Advances in DNA sequencing technology have sparked a genomics revolution, driving breakthroughs in plant genetics and crop breeding. Recently, the focus has shifted from cataloging genetic diversity in plants to exploring their functional significance and delivering beneficial alleles for crop improvement. This transformation has been facilitated by the increasing adoption of whole-genome resequencing. In this review, we summarize the current progress of population-based genome resequencing studies and how these studies affect crop breeding. A total of 187 land plants from 163 countries have been resequenced, comprising 54 413 accessions. As part of resequencing efforts 367 traits have been surveyed and 86 genome-wide association studies have been conducted. Economically important crops, particularly cereals, vegetables, and legumes, have dominated the resequencing efforts, leaving a gap in 49 orders, including Lycopodiales, Liliales, Acorales, Austrobaileyales, and Commelinales. The resequenced germplasm is distributed across diverse geographic locations, providing a global perspective on plant genomics. We highlight genes that have been selected during domestication, or associated with agronomic traits, and form a repository of candidate genes for future research and application. Despite the opportunities for cross-species comparative genomics, many population genomic datasets are not accessible, impeding secondary analyses. We call for a more open and collaborative approach to population genomics that promotes data sharing and encourages contribution-based credit policy. The number of plant genome resequencing studies will continue to rise with the decreasing DNA sequencing costs, coupled with advances in analysis and computational technologies. This expansion, in terms of both scale and quality, holds promise for deeper insights into plant trait genetics and breeding design.
Collapse
Affiliation(s)
- Bo Song
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Weidong Ning
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; Huazhong Agricultural University, College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Wuhan, Hubei, China
| | - Di Wei
- Biotechnology Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 53007, China
| | - Mengyun Jiang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China; Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Kun Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China; Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Xingwei Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475004, China; Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Damaris A Odeny
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) - Eastern and Southern Africa, Nairobi, Kenya
| | - Shifeng Cheng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China.
| |
Collapse
|
13
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
14
|
Amas JC, Thomas WJW, Zhang Y, Edwards D, Batley J. Key Advances in the New Era of Genomics-Assisted Disease Resistance Improvement of Brassica Species. PHYTOPATHOLOGY 2023:PHYTO08220289FI. [PMID: 36324059 DOI: 10.1094/phyto-08-22-0289-fi] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Disease resistance improvement remains a major focus in breeding programs as diseases continue to devastate Brassica production systems due to intensive cultivation and climate change. Genomics has paved the way to understand the complex genomes of Brassicas, which has been pivotal in the dissection of the genetic underpinnings of agronomic traits driving the development of superior cultivars. The new era of genomics-assisted disease resistance breeding has been marked by the development of high-quality genome references, accelerating the identification of disease resistance genes controlling both qualitative (major) gene and quantitative resistance. This facilitates the development of molecular markers for marker assisted selection and enables genome editing approaches for targeted gene manipulation to enhance the genetic value of disease resistance traits. This review summarizes the key advances in the development of genomic resources for Brassica species, focusing on improved genome references, based on long-read sequencing technologies and pangenome assemblies. This is further supported by the advances in pathogen genomics, which have resulted in the discovery of pathogenicity factors, complementing the mining of disease resistance genes in the host. Recognizing the co-evolutionary arms race between the host and pathogen, it is critical to identify novel resistance genes using crop wild relatives and synthetic cultivars or through genetic manipulation via genome-editing to sustain the development of superior cultivars. Integrating these key advances with new breeding techniques and improved phenotyping using advanced data analysis platforms will make disease resistance improvement in Brassica species more efficient and responsive to current and future demands.
Collapse
Affiliation(s)
- Junrey C Amas
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, Australia 6001
| | - William J W Thomas
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, Australia 6001
| | - Yueqi Zhang
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, Australia 6001
| | - David Edwards
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, Australia 6001
| | - Jacqueline Batley
- School of Biological Sciences and The UWA Institute of Agriculture, The University of Western Australia, Perth, WA, Australia 6001
| |
Collapse
|
15
|
Liang M, Cao S, Deng T, Du L, Li K, An B, Du Y, Xu L, Zhang L, Gao X, Li J, Guo P, Gao H. MAK: a machine learning framework improved genomic prediction via multi-target ensemble regressor chains and automatic selection of assistant traits. Brief Bioinform 2023; 24:7031157. [PMID: 36752363 DOI: 10.1093/bib/bbad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 01/13/2023] [Accepted: 01/20/2023] [Indexed: 02/09/2023] Open
Abstract
Incorporating the genotypic and phenotypic of the correlated traits into the multi-trait model can significantly improve the prediction accuracy of the target trait in animal and plant breeding, as well as human genetics. However, in most cases, the phenotypic information of the correlated and target trait of the individual to be evaluated was null simultaneously, particularly for the newborn. Therefore, we propose a machine learning framework, MAK, to improve the prediction accuracy of the target trait by constructing the multi-target ensemble regression chains and selecting the assistant trait automatically, which predicted the genomic estimated breeding values of the target trait using genotypic information only. The prediction ability of MAK was significantly more robust than the genomic best linear unbiased prediction, BayesB, BayesRR and the multi trait Bayesian method in the four real animal and plant datasets, and the computational efficiency of MAK was roughly 100 times faster than BayesB and BayesRR.
Collapse
Affiliation(s)
- Mang Liang
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Sheng Cao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Tianyu Deng
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lili Du
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Keanning Li
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Bingxing An
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Yueying Du
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lingyang Xu
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Lupei Zhang
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Xue Gao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | - Junya Li
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| | | | - Huijiang Gao
- Chinese Academy of Agricultural Sciences Institute of Animal Science
| |
Collapse
|
16
|
Neik TX, Siddique KHM, Mayes S, Edwards D, Batley J, Mabhaudhi T, Song BK, Massawe F. Diversifying agrifood systems to ensure global food security following the Russia–Ukraine crisis. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2023. [DOI: 10.3389/fsufs.2023.1124640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023] Open
Abstract
The recent Russia–Ukraine conflict has raised significant concerns about global food security, leaving many countries with restricted access to imported staple food crops, particularly wheat and sunflower oil, sending food prices soaring with other adverse consequences in the food supply chain. This detrimental effect is particularly prominent for low-income countries relying on grain imports, with record-high food prices and inflation affecting their livelihoods. This review discusses the role of Russia and Ukraine in the global food system and the impact of the Russia–Ukraine conflict on food security. It also highlights how diversifying four areas of agrifood systems—markets, production, crops, and technology can contribute to achieving food supply chain resilience for future food security and sustainability.
Collapse
|
17
|
Bavykina M, Kostina N, Lee CR, Schafleitner R, Bishop-von Wettberg E, Nuzhdin SV, Samsonova M, Gursky V, Kozlov K. Modeling of Flowering Time in Vigna radiata with Artificial Image Objects, Convolutional Neural Network and Random Forest. PLANTS (BASEL, SWITZERLAND) 2022; 11:3327. [PMID: 36501364 PMCID: PMC9738219 DOI: 10.3390/plants11233327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/22/2022] [Accepted: 11/28/2022] [Indexed: 06/17/2023]
Abstract
Flowering time is an important target for breeders in developing new varieties adapted to changing conditions. In this work, a new approach is proposed in which the SNP markers influencing time to flowering in mung bean are selected as important features in a random forest model. The genotypic and weather data are encoded in artificial image objects, and a model for flowering time prediction is constructed as a convolutional neural network. The model uses weather data for only a limited time period of 5 days before and 20 days after planting and is capable of predicting the time to flowering with high accuracy. The most important factors for model solution were identified using saliency maps and a Score-CAM method. Our approach can help breeding programs harness genotypic and phenotypic diversity to more effectively produce varieties with a desired flowering time.
Collapse
Affiliation(s)
- Maria Bavykina
- Mathematical Biology and Bioinformatics Lab, Peter the Great St. Petersburg Polytechnic University, 195251 Saint Petersburg, Russia
| | - Nadezhda Kostina
- Mathematical Biology and Bioinformatics Lab, Peter the Great St. Petersburg Polytechnic University, 195251 Saint Petersburg, Russia
| | - Cheng-Ruei Lee
- Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei 106319, Taiwan
| | | | - Eric Bishop-von Wettberg
- Department of Plant and Soil Science, Gund Institute for the Environment, University of Vermont, Burlington, VT 05405, USA
| | - Sergey V. Nuzhdin
- Mathematical Biology and Bioinformatics Lab, Peter the Great St. Petersburg Polytechnic University, 195251 Saint Petersburg, Russia
- Program Molecular and Computation Biology, University of California, Los-Angeles, CA 90095, USA
| | - Maria Samsonova
- Mathematical Biology and Bioinformatics Lab, Peter the Great St. Petersburg Polytechnic University, 195251 Saint Petersburg, Russia
| | - Vitaly Gursky
- Theoretical Department, Ioffe Institute, 194021 Saint Petersburg, Russia
| | - Konstantin Kozlov
- Mathematical Biology and Bioinformatics Lab, Peter the Great St. Petersburg Polytechnic University, 195251 Saint Petersburg, Russia
| |
Collapse
|
18
|
Tirnaz S, Zandberg J, Thomas WJW, Marsh J, Edwards D, Batley J. Application of crop wild relatives in modern breeding: An overview of resources, experimental and computational methodologies. FRONTIERS IN PLANT SCIENCE 2022; 13:1008904. [PMID: 36466237 PMCID: PMC9712971 DOI: 10.3389/fpls.2022.1008904] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/25/2022] [Indexed: 06/01/2023]
Abstract
Global agricultural industries are under pressure to meet the future food demand; however, the existing crop genetic diversity might not be sufficient to meet this expectation. Advances in genome sequencing technologies and availability of reference genomes for over 300 plant species reveals the hidden genetic diversity in crop wild relatives (CWRs), which could have significant impacts in crop improvement. There are many ex-situ and in-situ resources around the world holding rare and valuable wild species, of which many carry agronomically important traits and it is crucial for users to be aware of their availability. Here we aim to explore the available ex-/in- situ resources such as genebanks, botanical gardens, national parks, conservation hotspots and inventories holding CWR accessions. In addition we highlight the advances in availability and use of CWR genomic resources, such as their contribution in pangenome construction and introducing novel genes into crops. We also discuss the potential and challenges of modern breeding experimental approaches (e.g. de novo domestication, genome editing and speed breeding) used in CWRs and the use of computational (e.g. machine learning) approaches that could speed up utilization of CWR species in breeding programs towards crop adaptability and yield improvement.
Collapse
|
19
|
Seyum EG, Bille NH, Abtew WG, Munyengwa N, Bell JM, Cros D. Genomic selection in tropical perennial crops and plantation trees: a review. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2022; 42:58. [PMID: 37313015 PMCID: PMC10248687 DOI: 10.1007/s11032-022-01326-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 09/06/2022] [Indexed: 06/15/2023]
Abstract
To overcome the multiple challenges currently faced by agriculture, such as climate change and soil deterioration, more efficient plant breeding strategies are required. Genomic selection (GS) is crucial for the genetic improvement of quantitative traits, as it can increase selection intensity, shorten the generation interval, and improve selection accuracy for traits that are difficult to phenotype. Tropical perennial crops and plantation trees are of major economic importance and have consequently been the subject of many GS articles. In this review, we discuss the factors that affect GS accuracy (statistical models, linkage disequilibrium, information concerning markers, relatedness between training and target populations, the size of the training population, and trait heritability) and the genetic gain expected in these species. The impact of GS will be particularly strong in tropical perennial crops and plantation trees as they have long breeding cycles and constrained selection intensity. Future GS prospects are also discussed. High-throughput phenotyping will allow constructing of large training populations and implementing of phenomic selection. Optimized modeling is needed for longitudinal traits and multi-environment trials. The use of multi-omics, haploblocks, and structural variants will enable going beyond single-locus genotype data. Innovative statistical approaches, like artificial neural networks, are expected to efficiently handle the increasing amounts of heterogeneous multi-scale data. Targeted recombinations on sites identified from profiles of marker effects have the potential to further increase genetic gain. GS can also aid re-domestication and introgression breeding. Finally, GS consortia will play an important role in making the best of these opportunities. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-022-01326-4.
Collapse
Affiliation(s)
- Essubalew Getachew Seyum
- Department of Plant Biology and Physiology, Faculty of Sciences, University of Yaoundé I, Yaoundé, Cameroon
- Department of Horticulture and Plant Sciences, College of Agriculture and Veterinary Medicine, Jimma University, P.O. Box 307, Jimma, Ethiopia
| | - Ngalle Hermine Bille
- Department of Plant Biology and Physiology, Faculty of Sciences, University of Yaoundé I, Yaoundé, Cameroon
| | - Wosene Gebreselassie Abtew
- Department of Horticulture and Plant Sciences, College of Agriculture and Veterinary Medicine, Jimma University, P.O. Box 307, Jimma, Ethiopia
| | - Norman Munyengwa
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD 4072 Australia
| | - Joseph Martin Bell
- Department of Plant Biology and Physiology, Faculty of Sciences, University of Yaoundé I, Yaoundé, Cameroon
| | - David Cros
- CIRAD, UMR AGAP Institut, 34398 Montpellier, France
- UMR AGAP Institut, CIRAD, INRAE, Univ. Montpellier, Institut Agro, 34398 Montpellier, France
| |
Collapse
|
20
|
Hübner S. Are we there yet? Driving the road to evolutionary graph-pangenomics. CURRENT OPINION IN PLANT BIOLOGY 2022; 66:102195. [PMID: 35217472 DOI: 10.1016/j.pbi.2022.102195] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 01/20/2022] [Accepted: 01/21/2022] [Indexed: 06/14/2023]
Abstract
With increase in the number of sequenced genomes, it is now recognized that graph-based pangenomes can provide a comprehensive platform to study diversity in a population or species, from point mutations to large chromosomal rearrangements. By incorporating concepts from graph theory, a graph-pangenome can be studied directly to identify genomic regions and genes that underlie important evolutionary processes and traits. Here, I discuss how basic concepts in graph theory can be implemented to address questions in evolutionary genomics and guide future breeding efforts. Despite its compelling versatility, a graph-pangenome assembly is still challenging especially in species with large complex genomes. As technology is rapidly improving, the graph-pangenome is expected to become a central platform in genomics studies and applications. Thus, development of tools and methods that exploit the graph structure are urged to pave the route to evolutionary graph-pangenomics.
Collapse
Affiliation(s)
- Sariel Hübner
- Galilee Research Institute (Migal), Tel-Hai Academic College, Upper Galilee, 12210, Israel.
| |
Collapse
|
21
|
Pangenomics in Microbial and Crop Research: Progress, Applications, and Perspectives. Genes (Basel) 2022; 13:genes13040598. [PMID: 35456404 PMCID: PMC9031676 DOI: 10.3390/genes13040598] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 03/16/2022] [Accepted: 03/25/2022] [Indexed: 01/25/2023] Open
Abstract
Advances in sequencing technologies and bioinformatics tools have fueled a renewed interest in whole genome sequencing efforts in many organisms. The growing availability of multiple genome sequences has advanced our understanding of the within-species diversity, in the form of a pangenome. Pangenomics has opened new avenues for future research such as allowing dissection of complex molecular mechanisms and increased confidence in genome mapping. To comprehensively capture the genetic diversity for improving plant performance, the pangenome concept is further extended from species to genus level by the inclusion of wild species, constituting a super-pangenome. Characterization of pangenome has implications for both basic and applied research. The concept of pangenome has transformed the way biological questions are addressed. From understanding evolution and adaptation to elucidating host–pathogen interactions, finding novel genes or breeding targets to aid crop improvement to design effective vaccines for human prophylaxis, the increasing availability of the pangenome has revolutionized several aspects of biological research. The future availability of high-resolution pangenomes based on reference-level near-complete genome assemblies would greatly improve our ability to address complex biological problems.
Collapse
|
22
|
Tay Fernandez CG, Nestor BJ, Danilevicz MF, Gill M, Petereit J, Bayer PE, Finnegan PM, Batley J, Edwards D. Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species. Int J Mol Sci 2022; 23:2671. [PMID: 35269811 PMCID: PMC8910360 DOI: 10.3390/ijms23052671] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 02/21/2022] [Accepted: 02/21/2022] [Indexed: 02/01/2023] Open
Abstract
Pangenomes are a rich resource to examine the genomic variation observed within a species or genera, supporting population genetics studies, with applications for the improvement of crop traits. Major crop species such as maize (Zea mays), rice (Oryza sativa), Brassica (Brassica spp.), and soybean (Glycine max) have had pangenomes constructed and released, and this has led to the discovery of valuable genes associated with disease resistance and yield components. However, pangenome data are not available for many less prominent crop species that are currently under-utilised. Despite many under-utilised species being important food sources in regional populations, the scarcity of genomic data for these species hinders their improvement. Here, we assess several under-utilised crops and review the pangenome approaches that could be used to build resources for their improvement. Many of these under-utilised crops are cultivated in arid or semi-arid environments, suggesting that novel genes related to drought tolerance may be identified and used for introgression into related major crop species. In addition, we discuss how previously collected data could be used to enrich pangenome functional analysis in genome-wide association studies (GWAS) based on studies in major crops. Considering the technological advances in genome sequencing, pangenome references for under-utilised species are becoming more obtainable, offering the opportunity to identify novel genes related to agro-morphological traits in these species.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - David Edwards
- School of Biological Sciences, The University of Western Australia, Perth, WA 6009, Australia; (C.G.T.F.); (B.J.N.); (M.F.D.); (M.G.); (J.P.); (P.E.B.); (P.M.F.); (J.B.)
| |
Collapse
|
23
|
Tay Fernandez CG, Marsh JI, Nestor BJ, Gill M, Golicz AA, Bayer PE, Edwards D. An SGSGeneloss-Based Method for Constructing a Gene Presence-Absence Table Using Mosdepth. Methods Mol Biol 2022; 2512:73-80. [PMID: 35818000 DOI: 10.1007/978-1-0716-2429-6_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Presence-absence variants (PAV) are genomic regions present in some individuals of a species, but not others. PAVs have been shown to contribute to genomic diversity, especially in bacteria and plants. These structural variations have been linked to traits and can be used to track a species' evolutionary history. PAVs are usually called by aligning short read sequence data from one or more individuals to a reference genome or pangenome assembly, and then comparing coverage. Regions where reads do not align define absence in that individual, and the regions are classified as PAVs. The method below details how to align sequence reads to a reference and how to use the sequencing-coverage calculator Mosdepth to identify PAVs and construct a PAV table for use in downstream comparative genome analysis.
Collapse
Affiliation(s)
- Cassandria G Tay Fernandez
- Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia
| | - Jacob I Marsh
- Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia
| | - Benjamin J Nestor
- Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia
| | - Agnieszka A Golicz
- Department of Plant Breeding, Justus Liebig University Gießen, Gießen, Germany
| | - Philipp E Bayer
- Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia
| | - David Edwards
- Applied Bioinformatics Group, School of Biological Sciences, The University of Western Australia, Perth, WA, Australia.
| |
Collapse
|
24
|
Volk GM, Byrne PF, Coyne CJ, Flint-Garcia S, Reeves PA, Richards C. Integrating Genomic and Phenomic Approaches to Support Plant Genetic Resources Conservation and Use. PLANTS (BASEL, SWITZERLAND) 2021; 10:2260. [PMID: 34834625 PMCID: PMC8619436 DOI: 10.3390/plants10112260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 10/20/2021] [Accepted: 10/20/2021] [Indexed: 05/17/2023]
Abstract
Plant genebanks provide genetic resources for breeding and research programs worldwide. These programs benefit from having access to high-quality, standardized phenotypic and genotypic data. Technological advances have made it possible to collect phenomic and genomic data for genebank collections, which, with the appropriate analytical tools, can directly inform breeding programs. We discuss the importance of considering genebank accession homogeneity and heterogeneity in data collection and documentation. Citing specific examples, we describe how well-documented genomic and phenomic data have met or could meet the needs of plant genetic resource managers and users. We explore future opportunities that may emerge from improved documentation and data integration among plant genetic resource information systems.
Collapse
Affiliation(s)
- Gayle M. Volk
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| | - Patrick F. Byrne
- Department of Soil and Crop Sciences, Colorado State University, Fort Collins, CO 80523, USA;
| | - Clarice J. Coyne
- United States Department of Agriculture, Agricultural Research Service, Western Regional Plant Introduction Station, Pullman, WA 99164, USA;
| | - Sherry Flint-Garcia
- Plant Genetics Research Unit, United States Department of Agriculture, Agricultural Research Service, Columbia, MO 65211, USA;
| | - Patrick A. Reeves
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| | - Chris Richards
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| |
Collapse
|