1
|
Wiersma AT, Hamilton JP, Vaillancourt B, Brose J, Awale HE, Wright EM, Kelly JD, Buell CR. k-mer genome-wide association study for anthracnose and BCMV resistance in a Phaseolus vulgaris Andean Diversity Panel. THE PLANT GENOME 2024; 17:e20523. [PMID: 39397345 PMCID: PMC11628888 DOI: 10.1002/tpg2.20523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/10/2024] [Accepted: 09/11/2024] [Indexed: 10/15/2024]
Abstract
Access to broad genomic resources and closely linked marker-trait associations for common beans (Phaseolus vulgaris L.) can facilitate development of improved varieties with increased yield, improved market quality traits, and enhanced disease resistance. The emergence of virulent races of anthracnose (caused by Colletotrichum lindemuthianum) and bean common mosaic virus (BCMV) highlight the need for improved methods to identify and incorporate pan-genomic variation in breeding for disease resistance. We sequenced the P. vulgaris Andean Diversity Panel (ADP) and performed a genome-wide association study (GWAS) to identify associations for resistance to BCMV and eight races of anthracnose. Historical single nucleotide polymorphism (SNP)-chip and phenotypic data enabled a three-way comparison between SNP-chip, reference-based whole genome shotgun sequence (WGS)-SNP, and reference-free k-mer (short nucleotide subsequence) GWAS. Across all traits, there was excellent concordance between SNP-chip, WGS-SNP, and k-mer GWAS results-albeit at a much higher marker resolution for the WGS data sets. Significant k-mer haplotype variation revealed selection of the linked I-gene and Co-u traits in North American breeding lines and cultivars. Due to structural variation, only 9.1 to 47.3% of the significantly associated k-mers could be mapped to the reference genome. Thus, to determine the genetic context of cis-associated k-mers, we generated draft whole genome assemblies of four ADP accessions and identified an expanded local repertoire of disease resistance genes associated with resistance to anthracnose and BCMV. With access to variant data in the context of a pan-genome, high resolution mapping of agronomic traits for common bean is now feasible.
Collapse
Affiliation(s)
- Andrew T. Wiersma
- Archer Daniels Midland CompanyNew PlymouthIdahoUSA
- Department of Plant, Soil and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
- Plant Resilience InstituteMichigan State UniversityEast LansingMichiganUSA
| | - John P. Hamilton
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Center for Applied Genetic TechnologiesUniversity of GeorgiaAthensGeorgiaUSA
- Department of Crop and Soil SciencesUniversity of GeorgiaAthensGeorgiaUSA
| | - Brieanne Vaillancourt
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Center for Applied Genetic TechnologiesUniversity of GeorgiaAthensGeorgiaUSA
| | - Julia Brose
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Center for Applied Genetic TechnologiesUniversity of GeorgiaAthensGeorgiaUSA
| | - Halima E. Awale
- Department of Plant, Soil and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
| | - Evan M. Wright
- Department of Plant, Soil and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
| | - James D. Kelly
- Department of Plant, Soil and Microbial SciencesMichigan State UniversityEast LansingMichiganUSA
| | - C. Robin Buell
- Plant Resilience InstituteMichigan State UniversityEast LansingMichiganUSA
- Department of Plant BiologyMichigan State UniversityEast LansingMichiganUSA
- Center for Applied Genetic TechnologiesUniversity of GeorgiaAthensGeorgiaUSA
- Department of Crop and Soil SciencesUniversity of GeorgiaAthensGeorgiaUSA
- Institute of Plant Breeding, Genetics & GenomicsUniversity of GeorgiaAthensGeorgiaUSA
- The Plant CenterUniversity of GeorgiaAthensGeorgiaUSA
| |
Collapse
|
2
|
Groza C, Chen X, Wheeler TJ, Bourque G, Goubert C. A unified framework to analyze transposable element insertion polymorphisms using graph genomes. Nat Commun 2024; 15:8915. [PMID: 39414821 PMCID: PMC11484939 DOI: 10.1038/s41467-024-53294-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 10/02/2024] [Indexed: 10/18/2024] Open
Abstract
Transposable elements are ubiquitous mobile DNA sequences generating insertion polymorphisms, contributing to genomic diversity. We present GraffiTE, a flexible pipeline to analyze polymorphic mobile elements insertions. By integrating state-of-the-art structural variant detection algorithms and graph genomes, GraffiTE identifies polymorphic mobile elements from genomic assemblies or long-read sequencing data, and genotypes these variants using short or long read sets. Benchmarking on simulated and real datasets reports high precision and recall rates. GraffiTE is designed to allow non-expert users to perform comprehensive analyses, including in models with limited transposable element knowledge and is compatible with various sequencing technologies. Here, we demonstrate the versatility of GraffiTE by analyzing human, Drosophila melanogaster, maize, and Cannabis sativa pangenome data. These analyses reveal the landscapes of polymorphic mobile elements and their frequency variations across individuals, strains, and cultivars.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | - Xun Chen
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA
| | - Guillaume Bourque
- Institute for the Advanced Study of Human Biology (ASHBi), Kyoto University, Kyoto, Japan
- Canadian Centre for Computational Genomics, McGill University, Montréal, QC, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada
- Human Genetics, McGill University, Montréal, QC, Canada
| | - Clément Goubert
- Human Genetics, McGill University, Montréal, QC, Canada.
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
3
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
4
|
Hwang S, Brown NK, Ahmed OY, Jenike KM, Kovaka S, Schatz MC, Langmead B. MEM-based pangenome indexing for k-mer queries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.20.595044. [PMID: 38826299 PMCID: PMC11142109 DOI: 10.1101/2024.05.20.595044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5x faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.
Collapse
Affiliation(s)
- Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore MD, USA
| | - Nathaniel K. Brown
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | - Omar Y. Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | | | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| |
Collapse
|
5
|
Dallinger HG, Löschenberger F, Azrak N, Ametz C, Michel S, Bürstmayr H. Genome-wide association mapping for pre-harvest sprouting in European winter wheat detects novel resistance QTL, pleiotropic effects, and structural variation in multiple genomes. THE PLANT GENOME 2024; 17:e20301. [PMID: 36851839 DOI: 10.1002/tpg2.20301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/20/2022] [Indexed: 06/18/2023]
Abstract
Pre-harvest sprouting (PHS), germination of seeds before harvest, is a major problem in global wheat (Triticum aestivum L.) production, and leads to reduced bread-making quality in affected grain. Breeding for PHS resistance can prevent losses under adverse conditions. Selecting resistant lines in years lacking pre-harvest rain, requires challenging of plants in the field or in the laboratory or using genetic markers. Despite the availability of a wheat reference and pan-genome, linking markers, genes, allelic, and structural variation, a complete understanding of the mechanisms underlying various sources of PHS resistance is still lacking. Therefore, we challenged a population of European wheat varieties and breeding lines with PHS conditions and phenotyped them for PHS traits, grain quality, phenological and agronomic traits to conduct genome-wide association mapping. Furthermore, we compared these marker-trait associations to previously reported PHS loci and evaluated their usefulness for breeding. We found markers associated with PHS on all chromosomes, with strong evidence for novel quantitative trait locus/loci (QTL) on chromosome 1A and 5B. The QTL on chromosome 1A lacks pleiotropic effect, for the QTL on 5B we detected pleiotropic effects on phenology and grain quality. Multiple peaks on chromosome 4A co-located with the major resistance locus Phs-A1, for which two causal genes, TaPM19 and TaMKK3, have been proposed. Mapping markers and genes to the pan-genome and chromosomal alignments provide evidence for structural variation around this major PHS-resistance locus. Although PHS is controlled by many loci distributed across the wheat genome, Phs-A1 on chromosome 4A seems to be the most effective and widely deployed source of resistance, in European wheat varieties.
Collapse
Affiliation(s)
- Hermann G Dallinger
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, IFA-Tulln, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Straße 20, Tulln, Austria
- Saatzucht Donau GesmbH & Co KG, Saatzuchtstrasse 11, Probstdorf, Austria
| | | | - Naim Azrak
- Saatzucht Donau GesmbH & Co KG, Saatzuchtstrasse 11, Probstdorf, Austria
| | - Christian Ametz
- Saatzucht Donau GesmbH & Co KG, Saatzuchtstrasse 11, Probstdorf, Austria
| | - Sebastian Michel
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, IFA-Tulln, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Straße 20, Tulln, Austria
| | - Hermann Bürstmayr
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology, IFA-Tulln, University of Natural Resources and Life Sciences Vienna, Konrad-Lorenz-Straße 20, Tulln, Austria
| |
Collapse
|
6
|
Abondio P, Bruno F, Passarino G, Montesanto A, Luiselli D. Pangenomics: A new era in the field of neurodegenerative diseases. Ageing Res Rev 2024; 94:102180. [PMID: 38163518 DOI: 10.1016/j.arr.2023.102180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
A pangenome is composed of all the genetic variability of a group of individuals, and its application to the study of neurodegenerative diseases may provide valuable insights into the underlying aspects of genetic heterogenetiy for these complex ailments, including gene expression, epigenetics, and translation mechanisms. Furthermore, a reference pangenome allows for the identification of previously undetected structural commonalities and differences among individuals, which may help in the diagnosis of a disease, support the prediction of what will happen over time (prognosis) and aid in developing novel treatments in the perspective of personalized medicine. Therefore, in the present review, the application of the pangenome concept to the study of neurodegenerative diseases will be discussed and analyzed for its potential to enable an improvement in diagnosis and prognosis for these illnesses, leading to the development of tailored treatments for individual patients from the knowledge of the genomic composition of a whole population.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy.
| | - Francesco Bruno
- Academy of Cognitive Behavioral Sciences of Calabria (ASCoC), Lamezia Terme, Italy; Regional Neurogenetic Centre (CRN), Department of Primary Care, Azienda Sanitaria Provinciale Di Catanzaro, Viale A. Perugini, 88046 Lamezia Terme, CZ, Italy; Association for Neurogenetic Research (ARN), Lamezia Terme, CZ, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
7
|
Corut AK, Wallace JG. kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3 (BETHESDA, MD.) 2023; 14:jkad246. [PMID: 37976215 PMCID: PMC10755180 DOI: 10.1093/g3journal/jkad246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub (https://github.com/akcorut/kGWASflow) and Bioconda (https://anaconda.org/bioconda/kgwasflow).
Collapse
Affiliation(s)
- Adnan Kivanc Corut
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Jason G Wallace
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Institute of Plant Breeding, Genetics, and Genomics, University of Georgia, Athens, GA 30602, USA
- Department of Crop and Soil Sciences, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
8
|
Li S, Kong L, Xiao X, Li P, Liu A, Li J, Gong J, Gong W, Ge Q, Shang H, Pan J, Chen H, Peng Y, Zhang Y, Lu Q, Shi Y, Yuan Y. Genome-wide artificial introgressions of Gossypium barbadense into G. hirsutum reveal superior loci for simultaneous improvement of cotton fiber quality and yield traits. J Adv Res 2023; 53:1-16. [PMID: 36460274 PMCID: PMC10658236 DOI: 10.1016/j.jare.2022.11.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 10/31/2022] [Accepted: 11/24/2022] [Indexed: 12/02/2022] Open
Abstract
INTRODUCTION The simultaneous improvement of fiber quality and yield for cotton is strongly limited by the narrow genetic backgrounds of Gossypium hirsutum (Gh) and the negative genetic correlations among traits. An effective way to overcome the bottlenecks is to introgress the favorable alleles of Gossypium barbadense (Gb) for fiber quality into Gh with high yield. OBJECTIVES This study was to identify superior loci for the improvement of fiber quality and yield. METHODS Two sets of chromosome segment substitution lines (CSSLs) were generated by crossing Hai1 (Gb, donor-parent) with cultivar CCRI36 (Gh) and CCRI45 (Gh) as genetic backgrounds, and cultivated in 6 and 8 environments, respectively. The kmer genotyping strategy was improved and applied to the population genetic analysis of 743 genomic sequencing data. A progeny segregating population was constructed to validate genetic effects of the candidate loci. RESULTS A total of 68,912 and 83,352 genome-wide introgressed kmers were identified in the CCRI36 and CCRI45 populations, respectively. Over 90 % introgressions were homologous exchanges and about 21 % were reverse insertions. In total, 291 major introgressed segments were identified with stable genetic effects, of which 66(22.98 %), 64(21.99 %), 35(12.03 %), 31(10.65 %) and 18(6.19 %) were beneficial for the improvement of fiber length (FL), strength (FS), micronaire, lint-percentage (LP) and boll-weight, respectively. Thirty-nine introgression segments were detected with stable favorable additive effects for simultaneous improvement of 2 or more traits in Gh genetic background, including 6 could increase FL/FS and LP. The pyramiding effects of 3 pleiotropic segments (A07:C45Clu-081, D06:C45Clu-218, D02:C45Clu-193) were further validated in the segregating population. CONCLUSION The combining of genome-wide introgressions and kmer genotyping strategy showed significant advantages in exploring genetic resources. Through the genome-wide comprehensive mining, a total of 11 clusters (segments) were discovered for the stable simultaneous improvement of FL/FS and LP, which should be paid more attention in the future.
Collapse
Affiliation(s)
- Shaoqi Li
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China; Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Linglei Kong
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Xianghui Xiao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Pengtao Li
- School of Biotechnology and Food Engineering, Anyang Institute of Technology, Anyang 455000, China
| | - Aiying Liu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Junwen Li
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Juwu Gong
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Wankui Gong
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Qun Ge
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Haihong Shang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Jingtao Pan
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Hong Chen
- Cotton Research Institute, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi 832000, China
| | - Yan Peng
- Third Division of the Xinjiang Production and Construction Corps Agricultural Research Institute, Tumushuke 843900, China
| | - Yuanming Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Quanwei Lu
- School of Biotechnology and Food Engineering, Anyang Institute of Technology, Anyang 455000, China.
| | - Yuzhen Shi
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China.
| | - Youlu Yuan
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China; Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
9
|
Aylward AJ, Petrus S, Mamerto A, Hartwick NT, Michael TP. PanKmer: k-mer-based and reference-free pangenome analysis. Bioinformatics 2023; 39:btad621. [PMID: 37846049 PMCID: PMC10603592 DOI: 10.1093/bioinformatics/btad621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/29/2023] [Accepted: 10/13/2023] [Indexed: 10/18/2023] Open
Abstract
SUMMARY Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence-absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be "anchored" in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.
Collapse
Affiliation(s)
- Anthony J Aylward
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Semar Petrus
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Allen Mamerto
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Nolan T Hartwick
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Todd P Michael
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| |
Collapse
|
10
|
Karikari B, Lemay MA, Belzile F. k-mer-Based Genome-Wide Association Studies in Plants: Advances, Challenges, and Perspectives. Genes (Basel) 2023; 14:1439. [PMID: 37510343 PMCID: PMC10379394 DOI: 10.3390/genes14071439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Genome-wide association studies (GWAS) have allowed the discovery of marker-trait associations in crops over recent decades. However, their power is hampered by a number of limitations, with the key one among them being an overreliance on single-nucleotide polymorphisms (SNPs) as molecular markers. Indeed, SNPs represent only one type of genetic variation and are usually derived from alignment to a single genome assembly that may be poorly representative of the population under study. To overcome this, k-mer-based GWAS approaches have recently been developed. k-mer-based GWAS provide a universal way to assess variation due to SNPs, insertions/deletions, and structural variations without having to specifically detect and genotype these variants. In addition, k-mer-based analyses can be used in species that lack a reference genome. However, the use of k-mers for GWAS presents challenges such as data size and complexity, lack of standard tools, and potential detection of false associations. Nevertheless, efforts are being made to overcome these challenges and a general analysis workflow has started to emerge. We identify the priorities for k-mer-based GWAS in years to come, notably in the development of user-friendly programs for their analysis and approaches for linking significant k-mers to sequence variation.
Collapse
Affiliation(s)
- Benjamin Karikari
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
- Department of Agricultural Biotechnology, Faculty of Agriculture, Food and Consumer Sciences, University for Development Studies, Tamale P.O. Box TL 1882, Ghana
| | - Marc-André Lemay
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| |
Collapse
|
11
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
12
|
Kumar M, Kumar S, Sandhu KS, Kumar N, Saripalli G, Prakash R, Nambardar A, Sharma H, Gautam T, Balyan HS, Gupta PK. GWAS and genomic prediction for pre-harvest sprouting tolerance involving sprouting score and two other related traits in spring wheat. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2023; 43:14. [PMID: 37313293 PMCID: PMC10248620 DOI: 10.1007/s11032-023-01357-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 01/26/2023] [Indexed: 06/15/2023]
Abstract
In wheat, a genome-wide association study (GWAS) and genomic prediction (GP) analysis were conducted for pre-harvest sprouting (PHS) tolerance and two of its related traits. For this purpose, an association panel of 190 accessions was phenotyped for PHS (using sprouting score), falling number, and grain color over two years and genotyped with 9904 DArTseq based SNP markers. GWAS for main-effect quantitative trait nucleotides (M-QTNs) using three different models (CMLM, SUPER, and FarmCPU) and epistatic QTNs (E-QTNs) using PLINK were performed. A total of 171 M-QTNs (CMLM, 47; SUPER, 70; FarmCPU, 54) for all three traits, and 15 E-QTNs involved in 20 first-order epistatic interactions were identified. Some of the above QTNs overlapped the previously reported QTLs, MTAs, and cloned genes, allowing delineating 26 PHS-responsive genomic regions that spread over 16 wheat chromosomes. As many as 20 definitive and stable QTNs were considered important for use in marker-assisted recurrent selection (MARS). The gene, TaPHS1, for PHS tolerance (PHST) associated with one of the QTNs was also validated using the KASP assay. Some of the M-QTNs were shown to have a key role in the abscisic acid pathway involved in PHST. Genomic prediction accuracies (based on the cross-validation approach) using three different models ranged from 0.41 to 0.55, which are comparable to the results of previous studies. In summary, the results of the present study improved our understanding of the genetic architecture of PHST and its related traits in wheat and provided novel genomic resources for wheat breeding based on MARS and GP. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-023-01357-5.
Collapse
Affiliation(s)
- Manoj Kumar
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | - Sachin Kumar
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | | | - Neeraj Kumar
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC USA
| | - Gautam Saripalli
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD USA
| | - Ram Prakash
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | - Akash Nambardar
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | - Hemant Sharma
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | - Tinku Gautam
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | - Harindra Singh Balyan
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| | - Pushpendra Kumar Gupta
- Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut, UP India
| |
Collapse
|
13
|
Gangurde SS, Xavier A, Naik YD, Jha UC, Rangari SK, Kumar R, Reddy MSS, Channale S, Elango D, Mir RR, Zwart R, Laxuman C, Sudini HK, Pandey MK, Punnuri S, Mendu V, Reddy UK, Guo B, Gangarao NVPR, Sharma VK, Wang X, Zhao C, Thudi M. Two decades of association mapping: Insights on disease resistance in major crops. FRONTIERS IN PLANT SCIENCE 2022; 13:1064059. [PMID: 37082513 PMCID: PMC10112529 DOI: 10.3389/fpls.2022.1064059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 11/10/2022] [Indexed: 05/03/2023]
Abstract
Climate change across the globe has an impact on the occurrence, prevalence, and severity of plant diseases. About 30% of yield losses in major crops are due to plant diseases; emerging diseases are likely to worsen the sustainable production in the coming years. Plant diseases have led to increased hunger and mass migration of human populations in the past, thus a serious threat to global food security. Equipping the modern varieties/hybrids with enhanced genetic resistance is the most economic, sustainable and environmentally friendly solution. Plant geneticists have done tremendous work in identifying stable resistance in primary genepools and many times other than primary genepools to breed resistant varieties in different major crops. Over the last two decades, the availability of crop and pathogen genomes due to advances in next generation sequencing technologies improved our understanding of trait genetics using different approaches. Genome-wide association studies have been effectively used to identify candidate genes and map loci associated with different diseases in crop plants. In this review, we highlight successful examples for the discovery of resistance genes to many important diseases. In addition, major developments in association studies, statistical models and bioinformatic tools that improve the power, resolution and the efficiency of identifying marker-trait associations. Overall this review provides comprehensive insights into the two decades of advances in GWAS studies and discusses the challenges and opportunities this research area provides for breeding resistant varieties.
Collapse
Affiliation(s)
- Sunil S. Gangurde
- Crop Genetics and Breeding Research, United States Department of Agriculture (USDA) - Agriculture Research Service (ARS), Tifton, GA, United States
- Department of Plant Pathology, University of Georgia, Tifton, GA, United States
| | - Alencar Xavier
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | | | - Uday Chand Jha
- Indian Council of Agricultural Research (ICAR), Indian Institute of Pulses Research (IIPR), Kanpur, Uttar Pradesh, India
| | | | - Raj Kumar
- Dr. Rajendra Prasad Central Agricultural University (RPCAU), Bihar, India
| | - M. S. Sai Reddy
- Dr. Rajendra Prasad Central Agricultural University (RPCAU), Bihar, India
| | - Sonal Channale
- Crop Health Center, University of Southern Queensland (USQ), Toowoomba, QLD, Australia
| | - Dinakaran Elango
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Reyazul Rouf Mir
- Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences and Technology (SKUAST), Sopore, India
| | - Rebecca Zwart
- Crop Health Center, University of Southern Queensland (USQ), Toowoomba, QLD, Australia
| | - C. Laxuman
- Zonal Agricultural Research Station (ZARS), Kalaburagi, University of Agricultural Sciences, Raichur, Karnataka, India
| | - Hari Kishan Sudini
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Manish K. Pandey
- Crop Health Center, University of Southern Queensland (USQ), Toowoomba, QLD, Australia
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Somashekhar Punnuri
- College of Agriculture, Family Sciences and Technology, Dr. Fort Valley State University, Fort Valley, GA, United States
| | - Venugopal Mendu
- Department of Plant Science and Plant Pathology, Montana State University, Bozeman, MT, United States
| | - Umesh K. Reddy
- Department of Biology, West Virginia State University, West Virginia, WV, United States
| | - Baozhu Guo
- Crop Genetics and Breeding Research, United States Department of Agriculture (USDA) - Agriculture Research Service (ARS), Tifton, GA, United States
| | | | - Vinay K. Sharma
- Dr. Rajendra Prasad Central Agricultural University (RPCAU), Bihar, India
| | - Xingjun Wang
- Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences (SAAS), Jinan, China
| | - Chuanzhi Zhao
- Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences (SAAS), Jinan, China
| | - Mahendar Thudi
- Dr. Rajendra Prasad Central Agricultural University (RPCAU), Bihar, India
- Crop Health Center, University of Southern Queensland (USQ), Toowoomba, QLD, Australia
- Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences (SAAS), Jinan, China
| |
Collapse
|
14
|
Belzile F, Jean M, Torkamaneh D, Tardivel A, Lemay MA, Boudhrioua C, Arsenault-Labrecque G, Dussault-Benoit C, Lebreton A, de Ronne M, Tremblay V, Labbé C, O’Donoughue L, St-Amour VTB, Copley T, Fortier E, Ste-Croix DT, Mimee B, Cober E, Rajcan I, Warkentin T, Gagnon É, Legay S, Auclair J, Bélanger R. The SoyaGen Project: Putting Genomics to Work for Soybean Breeders. FRONTIERS IN PLANT SCIENCE 2022; 13:887553. [PMID: 35557742 PMCID: PMC9087807 DOI: 10.3389/fpls.2022.887553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 03/24/2022] [Indexed: 06/15/2023]
Abstract
The SoyaGen project was a collaborative endeavor involving Canadian soybean researchers and breeders from academia and the private sector as well as international collaborators. Its aims were to develop genomics-derived solutions to real-world challenges faced by breeders. Based on the needs expressed by the stakeholders, the research efforts were focused on maximizing realized yield through optimization of maturity and improved disease resistance. The main deliverables related to molecular breeding in soybean will be reviewed here. These include: (1) SNP datasets capturing the genetic diversity within cultivated soybean (both within a worldwide collection of > 1,000 soybean accessions and a subset of 102 short-season accessions (MG0 and earlier) directly relevant to this group); (2) SNP markers for selecting favorable alleles at key maturity genes as well as loci associated with increased resistance to key pathogens and pests (Phytophthora sojae, Heterodera glycines, Sclerotinia sclerotiorum); (3) diagnostic tools to facilitate the identification and mapping of specific pathotypes of P. sojae; and (4) a genomic prediction approach to identify the most promising combinations of parents. As a result of this fruitful collaboration, breeders have gained new tools and approaches to implement molecular, genomics-informed breeding strategies. We believe these tools and approaches are broadly applicable to soybean breeding efforts around the world.
Collapse
Affiliation(s)
- François Belzile
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Martine Jean
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Aurélie Tardivel
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
- Centre de Recherche sur les Grains (CEROM), Saint-Mathieu-de-Beloeil, QC, Canada
| | - Marc-André Lemay
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Chiheb Boudhrioua
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | | | | | - Amandine Lebreton
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Maxime de Ronne
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Vanessa Tremblay
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Caroline Labbé
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| | - Louise O’Donoughue
- Centre de Recherche sur les Grains (CEROM), Saint-Mathieu-de-Beloeil, QC, Canada
| | - Vincent-Thomas Boucher St-Amour
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
- Centre de Recherche sur les Grains (CEROM), Saint-Mathieu-de-Beloeil, QC, Canada
| | - Tanya Copley
- Centre de Recherche sur les Grains (CEROM), Saint-Mathieu-de-Beloeil, QC, Canada
| | - Eric Fortier
- Centre de Recherche sur les Grains (CEROM), Saint-Mathieu-de-Beloeil, QC, Canada
| | | | - Benjamin Mimee
- Agriculture and Agri-Food Canada, St-Jean-sur-Richelieu, QC, Canada
| | - Elroy Cober
- Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Tom Warkentin
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada
| | - Éric Gagnon
- Semences Prograin Inc., Saint-Césaire, QC, Canada
- Sevita Genetics, Inkerman, ON, Canada
| | | | | | - Richard Bélanger
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
| |
Collapse
|