1
|
Hung TH, Wu ETY, Zeltiņš P, Jansons Ā, Ullah A, Erbilgin N, Bohlmann J, Bousquet J, Birol I, Clegg SM, MacKay JJ. Long-insert sequence capture detects high copy numbers in a defence-related beta-glucosidase gene βglu-1 with large variations in white spruce but not Norway spruce. BMC Genomics 2024; 25:118. [PMID: 38281030 PMCID: PMC10821269 DOI: 10.1186/s12864-024-09978-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/05/2024] [Indexed: 01/29/2024] Open
Abstract
Conifers are long-lived and slow-evolving, thus requiring effective defences against their fast-evolving insect natural enemies. The copy number variation (CNV) of two key acetophenone biosynthesis genes Ugt5/Ugt5b and βglu-1 may provide a plausible mechanism underlying the constitutively variable defence in white spruce (Picea glauca) against its primary defoliator, spruce budworm. This study develops a long-insert sequence capture probe set (Picea_hung_p1.0) for quantifying copy number of βglu-1-like, Ugt5-like genes and single-copy genes on 38 Norway spruce (Picea abies) and 40 P. glauca individuals from eight and nine provenances across Europe and North America respectively. We developed local assemblies (Piabi_c1.0 and Pigla_c.1.0), full-length transcriptomes (PIAB_v1 and PIGL_v1), and gene models to characterise the diversity of βglu-1 and Ugt5 genes. We observed very large copy numbers of βglu-1, with up to 381 copies in a single P. glauca individual. We observed among-provenance CNV of βglu-1 in P. glauca but not P. abies. Ugt5b was predominantly single-copy in both species. This study generates critical hypotheses for testing the emergence and mechanism of extreme CNV, the dosage effect on phenotype, and the varying copy number of genes with the same pathway. We demonstrate new approaches to overcome experimental challenges in genomic research in conifer defences.
Collapse
Affiliation(s)
- Tin Hang Hung
- Department of Biology, University of Oxford, Oxford, OX1 3RB, UK.
| | - Ernest T Y Wu
- Department of Biology, University of Oxford, Oxford, OX1 3RB, UK
| | - Pauls Zeltiņš
- Latvian State Forest Research Institute "Silava", Salaspils, 2169, Latvia
| | - Āris Jansons
- Latvian State Forest Research Institute "Silava", Salaspils, 2169, Latvia
| | - Aziz Ullah
- Department of Renewable Resources, University of Alberta, Edmonton, AB, T6G 2E3, Canada
| | - Nadir Erbilgin
- Department of Renewable Resources, University of Alberta, Edmonton, AB, T6G 2E3, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jean Bousquet
- Canada Research Chair in Forest Genomics, Forest Research Centre, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Sonya M Clegg
- Department of Biology, University of Oxford, Oxford, OX1 3RB, UK
| | - John J MacKay
- Department of Biology, University of Oxford, Oxford, OX1 3RB, UK.
| |
Collapse
|
2
|
Abstract
Polyploidizations, or whole-genome duplications (WGDs), in plants have increased biological complexity, facilitated evolutionary innovation, and likely enabled adaptation under harsh conditions. Besides genomic data, transcriptome data have been widely employed to detect WGDs, due to their efficient accessibility to the gene space of a species. Age distributions based on synonymous substitutions (so-called KS age distributions) for paralogs assembled from transcriptome data have identified numerous WGDs in plants, paving the way for further studies on the importance of WGDs for the evolution of seed and flowering plants. However, it is still unclear how transcriptome-based age distributions compare to those based on genomic data. In this chapter, we implemented three different de novo transcriptome assembly pipelines with two popular assemblers, i.e., Trinity and SOAPdenovo-Trans. We selected six plant species with published genomes and transcriptomes to evaluate how assembled transcripts from different pipelines perform when using KS distributions to detect previously documented WGDs in the six species. Further, using genes predicted in each genome as references, we evaluated the effects of missing genes, gene family clustering, and de novo assembled transcripts on the transcriptome-based KS distributions. Our results show that, although the transcriptome-based KS distributions differ from the genome-based ones with respect to their shapes and scales, they are still reasonably reliable for unveiling WGDs, except in species where most duplicates originated from a recent WGD. We also discuss how to overcome some possible pitfalls when using transcriptome data to identify WGDs.
Collapse
Affiliation(s)
- Jia Li
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
| | - Zhen Li
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
| |
Collapse
|
3
|
Galla SJ, Brown L, Couch-Lewis Ngāi Tahu Te Hapū O Ngāti Wheke Ngāti Waewae Y, Cubrinovska I, Eason D, Gooley RM, Hamilton JA, Heath JA, Hauser SS, Latch EK, Matocq MD, Richardson A, Wold JR, Hogg CJ, Santure AW, Steeves TE. The relevance of pedigrees in the conservation genomics era. Mol Ecol 2021; 31:41-54. [PMID: 34553796 PMCID: PMC9298073 DOI: 10.1111/mec.16192] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 09/12/2021] [Accepted: 09/17/2021] [Indexed: 01/21/2023]
Abstract
Over the past 50 years conservation genetics has developed a substantive toolbox to inform species management. One of the most long‐standing tools available to manage genetics—the pedigree—has been widely used to characterize diversity and maximize evolutionary potential in threatened populations. Now, with the ability to use high throughput sequencing to estimate relatedness, inbreeding, and genome‐wide functional diversity, some have asked whether it is warranted for conservation biologists to continue collecting and collating pedigrees for species management. In this perspective, we argue that pedigrees remain a relevant tool, and when combined with genomic data, create an invaluable resource for conservation genomic management. Genomic data can address pedigree pitfalls (e.g., founder relatedness, missing data, uncertainty), and in return robust pedigrees allow for more nuanced research design, including well‐informed sampling strategies and quantitative analyses (e.g., heritability, linkage) to better inform genomic inquiry. We further contend that building and maintaining pedigrees provides an opportunity to strengthen trusted relationships among conservation researchers, practitioners, Indigenous Peoples, and Local Communities.
Collapse
Affiliation(s)
- Stephanie J Galla
- Department of Biological Sciences, Boise State University, Boise, Idaho, USA.,School of Biological Sciences, University of Canterbury, Christchurch, Canterbury, New Zealand
| | - Liz Brown
- New Zealand Department of Conservation, Twizel, Canterbury, New Zealand
| | | | - Ilina Cubrinovska
- School of Biological Sciences, University of Canterbury, Christchurch, Canterbury, New Zealand
| | - Daryl Eason
- New Zealand Department of Conservation, Invercargill, Southland, New Zealand
| | - Rebecca M Gooley
- Smithsonian-Mason School of Conservation, Front Royal, Maryland, USA.,Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA
| | - Jill A Hamilton
- Department of Biological Sciences, North Dakota State University, Fargo, North Dakota, USA
| | - Julie A Heath
- Department of Biological Sciences, Boise State University, Boise, Idaho, USA
| | - Samantha S Hauser
- Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Emily K Latch
- Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Marjorie D Matocq
- Department of Natural Resources and Environmental Science, Program in Ecology, Evolution and Conservation Biology, University of Nevada Reno, Reno, Nevada, USA
| | - Anne Richardson
- The Isaac Conservation and Wildlife Trust, Christchurch, Canterbury, New Zealand
| | - Jana R Wold
- School of Biological Sciences, University of Canterbury, Christchurch, Canterbury, New Zealand
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Anna W Santure
- School of Biological Sciences, University of Auckland, Auckland, Auckland, New Zealand
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, Christchurch, Canterbury, New Zealand
| |
Collapse
|
4
|
Williams AM, Itgen MW, Broz AK, Carter OG, Sloan DB. Long-read transcriptome and other genomic resources for the angiosperm Silene noctiflora. G3 (BETHESDA, MD.) 2021; 11:jkab189. [PMID: 34849814 PMCID: PMC8496259 DOI: 10.1093/g3journal/jkab189] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 05/20/2021] [Indexed: 01/04/2023]
Abstract
The angiosperm genus Silene is a model system for several traits of ecological and evolutionary significance in plants, including breeding system and sex chromosome evolution, host-pathogen interactions, invasive species biology, heavy metal tolerance, and cytonuclear interactions. Despite its importance, genomic resources for this large genus of approximately 850 species are scarce, with only one published whole-genome sequence (from the dioecious species Silene latifolia). Here, we provide genomic and transcriptomic resources for a hermaphroditic representative of this genus (S. noctiflora), including a PacBio Iso-Seq transcriptome, which uses long-read, single-molecule sequencing technology to analyze full-length mRNA transcripts. Using these data, we have assembled and annotated high-quality full-length cDNA sequences for approximately 14,126 S. noctiflora genes and 25,317 isoforms. We demonstrated the utility of these data to distinguish between recent and highly similar gene duplicates by identifying novel paralogous genes in an essential protease complex. Furthermore, we provide a draft assembly for the approximately 2.7-Gb genome of this species, which is near the upper range of genome-size values reported for diploids in this genus and threefold larger than the 0.9-Gb genome of Silene conica, another species in the same subgenus. Karyotyping confirmed that S. noctiflora is a diploid, indicating that its large genome size is not due to polyploidization. These resources should facilitate further study and development of this genus as a model in plant ecology and evolution.
Collapse
Affiliation(s)
- Alissa M Williams
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
- Cell and Molecular Biology Graduate Program, Colorado State University, Fort Collins, CO 80523, USA
| | - Michael W Itgen
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Amanda K Broz
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Olivia G Carter
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| |
Collapse
|
5
|
Grassa CJ, Weiblen GD, Wenger JP, Dabney C, Poplawski SG, Timothy Motley S, Michael TP, Schwartz CJ. A new Cannabis genome assembly associates elevated cannabidiol (CBD) with hemp introgressed into marijuana. THE NEW PHYTOLOGIST 2021; 230:1665-1679. [PMID: 33521943 PMCID: PMC8248131 DOI: 10.1111/nph.17243] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 01/18/2021] [Indexed: 05/20/2023]
Abstract
Demand for cannabidiol (CBD), the predominant cannabinoid in hemp (Cannabis sativa), has favored cultivars producing unprecedented quantities of CBD. We investigated the ancestry of a new cultivar and cannabinoid synthase genes in relation to cannabinoid inheritance. A nanopore-based assembly anchored to a high-resolution linkage map provided a chromosome-resolved genome for CBDRx, a potent CBD-type cultivar. We measured cannabinoid synthase expression by cDNA sequencing and conducted a population genetic analysis of diverse Cannabis accessions. Quantitative trait locus mapping of cannabinoids in a hemp × marijuana segregating population was also performed. Cannabinoid synthase paralogs are arranged in tandem arrays embedded in long terminal repeat retrotransposons on chromosome 7. Although CBDRx is predominantly of marijuana ancestry, the genome has cannabidiolic acid synthase (CBDAS) introgressed from hemp and lacks a complete sequence for tetrahydrocannabinolic acid synthase (THCAS). Three additional genomes, including one with complete THCAS, confirmed this genomic structure. Only cannabidiolic acid synthase (CBDAS) was expressed in CBD-type Cannabis, while both CBDAS and THCAS were expressed in a cultivar with an intermediate tetrahydrocannabinol (THC) : CBD ratio. Although variation among cannabinoid synthase loci might affect the THC : CBD ratio, variability among cultivars in overall cannabinoid content (potency) was also associated with other chromosomes.
Collapse
Affiliation(s)
| | - George D. Weiblen
- Department of Plant and Microbial BiologyUniversity of MinnesotaSaint PaulMN55108USA
| | - Jonathan P. Wenger
- Department of Plant and Microbial BiologyUniversity of MinnesotaSaint PaulMN55108USA
| | - Clemon Dabney
- Department of Plant and Microbial BiologyUniversity of MinnesotaSaint PaulMN55108USA
| | | | - S. Timothy Motley
- Department of InformaticsJ. Craig Venter InstituteLa JollaCA92037USA
| | - Todd P. Michael
- Department of InformaticsJ. Craig Venter InstituteLa JollaCA92037USA
- Present address:
Molecular and Cellular Biology LaboratorySalk Institute for Biological StudiesLa JollaCA92037USA
| | - C. J. Schwartz
- Sunrise Genetics Inc.Fort CollinsCO80525USA
- Present address:
Industrial Hemp Genetics LLCMadisonWI53705USA
| |
Collapse
|
6
|
Madritsch S, Burg A, Sehr EM. Comparing de novo transcriptome assembly tools in di- and autotetraploid non-model plant species. BMC Bioinformatics 2021; 22:146. [PMID: 33752598 PMCID: PMC7986043 DOI: 10.1186/s12859-021-04078-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 03/15/2021] [Indexed: 01/15/2023] Open
Abstract
Background Polyploidy is very common in plants and can be seen as one of the key drivers in the domestication of crops and the establishment of important agronomic traits. It can be the main source of genomic repatterning and introduces gene duplications, affecting gene expression and alternative splicing. Since fully sequenced genomes are not yet available for many plant species including crops, de novo transcriptome assembly is the basis to understand molecular and functional mechanisms. However, in complex polyploid plants, de novo transcriptome assembly is challenging, leading to increased rates of fused or redundant transcripts. Since assemblers were developed mainly for diploid organisms, they may not well suited for polyploids. Also, comparative evaluations of these tools on higher polyploid plants are extremely rare. Thus, our aim was to fill this gap and to provide a basic guideline for choosing the optimal de novo assembly strategy focusing on autotetraploids, as the scientific interest in this type of polyploidy is steadily increasing. Results We present a comparison of two common (SOAPdenovo-Trans, Trinity) and one recently published transcriptome assembler (TransLiG) on diploid and autotetraploid species of the genera Acer and Vaccinium using Arabidopsis thaliana as a reference. The number of assembled transcripts was up to 11 and 14 times higher with an increased number of short transcripts for Acer and Vaccinium, respectively, compared to A. thaliana. In diploid samples, Trinity and TransLiG performed similarly good while in autotetraploids, TransLiG assembled most complete transcriptomes with an average of 1916 assembled BUSCOs vs. 1705 BUSCOs for Trinity. Of all three assemblers, SOAPdenovo-Trans performed worst (1133 complete BUSCOs). Conclusion All three assembly tools produced complete assemblies when dealing with the model organism A. thaliana, independently of its ploidy level, but their performances differed extremely when it comes to non-model autotetraploids, where specifically TransLiG and Trinity produced a high number of redundant transcripts. The recently published assembler TransLiG has not been tested yet on any plant organism but showed highest completeness and full-length transcriptomes, especially in autotetraploids. Including such species during the development and testing of new assembly tools is highly appreciated and recommended as many important crops are polyploid. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04078-8.
Collapse
Affiliation(s)
- Silvia Madritsch
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Tulln, Austria.,Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Agnes Burg
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Tulln, Austria
| | - Eva M Sehr
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Tulln, Austria.
| |
Collapse
|
7
|
Zhao Z, Zhou Y, Wang S, Zhang X, Wang C, Li S. LDscaff: LD-based scaffolding of de novo genome assemblies. BMC Bioinformatics 2020; 21:570. [PMID: 33371875 PMCID: PMC7768660 DOI: 10.1186/s12859-020-03895-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 11/18/2020] [Indexed: 12/11/2022] Open
Abstract
Background Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. Results In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. Conclusions Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.
Collapse
Affiliation(s)
- Zicheng Zhao
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, 999077, China
| | - Yingxiao Zhou
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China
| | - Shuai Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, 999077, China
| | - Xiuqing Zhang
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China
| | - Changfa Wang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng City, 252059, Shandong, China.
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, 999077, China.
| |
Collapse
|
8
|
Zhou C, Olukolu B, Gemenet DC, Wu S, Gruneberg W, Cao MD, Fei Z, Zeng ZB, George AW, Khan A, Yencho GC, Coin LJM. Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations. Nat Genet 2020; 52:1256-1264. [PMID: 33128049 DOI: 10.1038/s41588-020-00717-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 09/15/2020] [Indexed: 12/31/2022]
Abstract
Despite advances in sequencing technologies, assembly of complex plant genomes remains elusive due to polyploidy and high repeat content. Here we report PolyGembler for grouping and ordering contigs into pseudomolecules by genetic linkage analysis. Our approach also provides an accurate method with which to detect and fix assembly errors. Using simulated data, we demonstrate that our approach is of high accuracy and outperforms three existing state-of-the-art genetic mapping tools. Particularly, our approach is more robust to the presence of missing genotype data and genotyping errors. We used our method to construct pseudomolecules for allotetraploid lawn grass utilizing PacBio long reads in combination with restriction site-associated DNA sequencing, and for diploid Ipomoea trifida and autotetraploid potato utilizing contigs assembled from Illumina reads in combination with genotype data generated by single-nucleotide polymorphism arrays and genotyping by sequencing, respectively. We resolved 13 assembly errors for a published I. trifida genome assembly and anchored eight unplaced scaffolds in the published potato genome.
Collapse
Affiliation(s)
- Chenxi Zhou
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
- Department of Clinical Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Bode Olukolu
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Dorcus C Gemenet
- International Potato Center, Lima, Peru
- CGIAR Excellence in Breeding Platform, International Maize and Wheat Improvement Center, Nairobi, Kenya
| | - Shan Wu
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | | | - Minh Duc Cao
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | - Zhao-Bang Zeng
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Andrew W George
- Data61, Commonwealth Scientific and Industrial Research Organisation, Brisbane, Queensland, Australia
| | - Awais Khan
- International Potato Center, Lima, Peru
- Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Geneva, NY, USA
| | - G Craig Yencho
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
| | - Lachlan J M Coin
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.
- Department of Clinical Pathology, University of Melbourne, Melbourne, Victoria, Australia.
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.
| |
Collapse
|
9
|
Fuller ZL, Mocellin VJL, Morris LA, Cantin N, Shepherd J, Sarre L, Peng J, Liao Y, Pickrell J, Andolfatto P, Matz M, Bay LK, Przeworski M. Population genetics of the coral Acropora millepora: Toward genomic prediction of bleaching. Science 2020; 369:369/6501/eaba4674. [PMID: 32675347 DOI: 10.1126/science.aba4674] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 06/01/2020] [Indexed: 12/11/2022]
Abstract
Although reef-building corals are declining worldwide, responses to bleaching vary within and across species and are partly heritable. Toward predicting bleaching response from genomic data, we generated a chromosome-scale genome assembly for the coral Acropora millepora We obtained whole-genome sequences for 237 phenotyped samples collected at 12 reefs along the Great Barrier Reef, among which we inferred little population structure. Scanning the genome for evidence of local adaptation, we detected signatures of long-term balancing selection in the heat-shock co-chaperone sacsin We conducted a genome-wide association study of visual bleaching score for 213 samples, incorporating the polygenic score derived from it into a predictive model for bleaching in the wild. These results set the stage for genomics-based approaches in conservation strategies.
Collapse
Affiliation(s)
- Zachary L Fuller
- Department of Biological Sciences, Columbia University, New York, NY, USA.
| | | | - Luke A Morris
- Australian Institute of Marine Science, Townsville, QLD, Australia.,AIMS@JCU, Australian Institute of Marine Science, College of Science and Engineering, James Cook University, Townsville, QLD, Australia.,College of Science and Engineering, James Cook University, Townsville, QLD, Australia
| | - Neal Cantin
- Australian Institute of Marine Science, Townsville, QLD, Australia
| | - Jihanne Shepherd
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Luke Sarre
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Julie Peng
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Yi Liao
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA.,Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA, USA
| | | | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Mikhail Matz
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Line K Bay
- Australian Institute of Marine Science, Townsville, QLD, Australia.
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, NY, USA. .,Department of Systems Biology, Columbia University, New York, NY, USA.,Program for Mathematical Genomics, Columbia University, New York, NY, USA
| |
Collapse
|
10
|
Waterhouse RM, Aganezov S, Anselmetti Y, Lee J, Ruzzante L, Reijnders MJMF, Feron R, Bérard S, George P, Hahn MW, Howell PI, Kamali M, Koren S, Lawson D, Maslen G, Peery A, Phillippy AM, Sharakhova MV, Tannier E, Unger MF, Zhang SV, Alekseyev MA, Besansky NJ, Chauve C, Emrich SJ, Sharakhov IV. Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biol 2020; 18:1. [PMID: 31898513 PMCID: PMC6939337 DOI: 10.1186/s12915-019-0728-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| | - Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, NJ, 08450, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Jiyoung Lee
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Livio Ruzzante
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Maarten J M F Reijnders
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Romain Feron
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Sèverine Bérard
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Phillip George
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Matthew W Hahn
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, 69622, Villeurbanne, France.,Institut national de recherche en informatique et en automatique, Montbonnot, 38334, Grenoble, Rhône-Alpes, France
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Simo V Zhang
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Max A Alekseyev
- Department of Mathematics and Computational Biology Institute, George Washington University, Ashburn, VA, 20147, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Igor V Sharakhov
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050.
| |
Collapse
|
11
|
Flagel LE, Blackman BK, Fishman L, Monnahan PJ, Sweigart A, Kelly JK. GOOGA: A platform to synthesize mapping experiments and identify genomic structural diversity. PLoS Comput Biol 2019; 15:e1006949. [PMID: 30986215 PMCID: PMC6483263 DOI: 10.1371/journal.pcbi.1006949] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 04/25/2019] [Accepted: 03/15/2019] [Indexed: 11/18/2022] Open
Abstract
Understanding genomic structural variation such as inversions and translocations is a key challenge in evolutionary genetics. We develop a novel statistical approach to comparative genetic mapping to detect large-scale structural mutations from low-level sequencing data. The procedure, called Genome Order Optimization by Genetic Algorithm (GOOGA), couples a Hidden Markov Model with a Genetic Algorithm to analyze data from genetic mapping populations. We demonstrate the method using both simulated data (calibrated from experiments on Drosophila melanogaster) and real data from five distinct crosses within the flowering plant genus Mimulus. Application of GOOGA to the Mimulus data corrects numerous errors (misplaced sequences) in the M. guttatus reference genome and confirms or detects eight large inversions polymorphic within the species complex. Finally, we show how this method can be applied in genomic scans to improve the accuracy and resolution of Quantitative Trait Locus (QTL) mapping.
Collapse
Affiliation(s)
- Lex E. Flagel
- Bayer Crop Science, Chesterfield, MO, United States of America
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, United States of America
- * E-mail: (LEF); (JKK)
| | - Benjamin K. Blackman
- Department of Plant and Microbial Biology, University of California—Berkeley, Berkeley, CA, United States of America
| | - Lila Fishman
- Division of Biological Sciences, University of Montana, Missoula, MT, United States of America
| | - Patrick J. Monnahan
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States of America
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN, United States of America
| | - Andrea Sweigart
- Department of Genetics, University of Georgia, Athens, GA, United States of America
| | - John K. Kelly
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States of America
- * E-mail: (LEF); (JKK)
| |
Collapse
|
12
|
Pengelly RJ, Collins A. Linkage disequilibrium maps to guide contig ordering for genome assembly. Bioinformatics 2018; 35:541-545. [DOI: 10.1093/bioinformatics/bty687] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 07/13/2018] [Accepted: 08/03/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Reuben J Pengelly
- Genetic Epidemiology & Bioinformatics, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Andrew Collins
- Genetic Epidemiology & Bioinformatics, Faculty of Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
13
|
Sefick SA, Castronova MA, Stevison LS. genotypeR
: An integrated
r
package for single nucleotide polymorphism genotype marker design and data analysis. Methods Ecol Evol 2018. [DOI: 10.1111/2041-210x.12965] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
14
|
Clonorchis sinensis and Clonorchiasis: The Relevance of Exploring Genetic Variation. ADVANCES IN PARASITOLOGY 2018; 100:155-208. [PMID: 29753338 DOI: 10.1016/bs.apar.2018.03.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Parasitic trematodes (flukes) cause substantial mortality and morbidity in humans. The Chinese liver fluke, Clonorchis sinensis, is one of the most destructive parasitic worms in humans in China, Vietnam, Korea and the Russian Far East. Although C. sinensis infection can be controlled relatively well using anthelmintics, the worm is carcinogenic, inducing cholangiocarcinoma and causing major suffering in ~15 million people in Asia. This chapter provides an account of C. sinensis and clonorchiasis research-covering aspects of biology, epidemiology, pathogenesis and immunity, diagnosis, treatment and control, genetics and genomics. It also describes progress in the area of molecular biology (genetics, genomics, transcriptomics and proteomics) and highlights challenges associated with comparative genomics and population genetics. It then reviews recent advances in the sequencing and characterisation of the mitochondrial and nuclear genomes for a Korean isolate of C. sinensis and summarises salient comparative genomic work and the implications thereof. The chapter concludes by considering how advances in genomic and informatics will enable research on the genetics of C. sinensis and related parasites, as well as the discovery of new fluke-specific intervention targets.
Collapse
|
15
|
Pfeifer SP. Direct estimate of the spontaneous germ line mutation rate in African green monkeys. Evolution 2017; 71:2858-2870. [PMID: 29068052 DOI: 10.1111/evo.13383] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 10/03/2017] [Accepted: 10/09/2017] [Indexed: 12/30/2022]
Abstract
Here, I provide the first direct estimate of the spontaneous mutation rate in an Old World monkey, using a seven individual, three-generation pedigree of African green monkeys. Eight de novo mutations were identified within ∼1.5 Gbp of accessible genome, corresponding to an estimated point mutation rate of 0.94 × 10-8 per site per generation, suggesting an effective population size of ∼12000 for the species. This estimation represents a significant improvement in our knowledge of the population genetics of the African green monkey, one of the most important nonhuman primate models in biomedical research. Furthermore, by comparing mutation rates in Old World monkeys with the only other direct estimates in primates to date-humans and chimpanzees-it is possible to uniquely address how mutation rates have evolved over longer time scales. While the estimated spontaneous mutation rate for African green monkeys is slightly lower than the rate of 1.2 × 10-8 per base pair per generation reported in chimpanzees, it is similar to the lower range of rates of 0.96 × 10-8 -1.28 × 10-8 per base pair per generation recently estimated from whole genome pedigrees in humans. This result suggests a long-term constraint on mutation rate that is quite different from similar evidence pertaining to recombination rate evolution in primates.
Collapse
Affiliation(s)
- Susanne P Pfeifer
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,School of Life Sciences, Arizona State University (ASU), Tempe, Arizona 85281
| |
Collapse
|
16
|
Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing. G3-GENES GENOMES GENETICS 2016; 6:2203-11. [PMID: 27226165 PMCID: PMC4938673 DOI: 10.1534/g3.115.026690] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.
Collapse
|
17
|
Stevison LS, Woerner AE, Kidd JM, Kelley JL, Veeramah KR, McManus KF, Bustamante CD, Hammer MF, Wall JD. The Time Scale of Recombination Rate Evolution in Great Apes. Mol Biol Evol 2016; 33:928-45. [PMID: 26671457 PMCID: PMC5870646 DOI: 10.1093/molbev/msv331] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
We present three linkage-disequilibrium (LD)-based recombination maps generated using whole-genome sequence data from 10 Nigerian chimpanzees, 13 bonobos, and 15 western gorillas, collected as part of the Great Ape Genome Project (Prado-Martinez J, et al. 2013. Great ape genetic diversity and population history. Nature 499:471-475). We also identified species-specific recombination hotspots in each group using a modified LDhot framework, which greatly improves statistical power to detect hotspots at varying strengths. We show that fewer hotspots are shared among chimpanzee subspecies than within human populations, further narrowing the time scale of complete hotspot turnover. Further, using species-specific PRDM9 sequences to predict potential binding sites (PBS), we show higher predicted PRDM9 binding in recombination hotspots as compared to matched cold spot regions in multiple great ape species, including at least one chimpanzee subspecies. We found that correlations between broad-scale recombination rates decline more rapidly than nucleotide divergence between species. We also compared the skew of recombination rates at centromeres and telomeres between species and show a skew from chromosome means extending as far as 10-15 Mb from chromosome ends. Further, we examined broad-scale recombination rate changes near a translocation in gorillas and found minimal differences as compared to other great ape species perhaps because the coordinates relative to the chromosome ends were unaffected. Finally, on the basis of multiple linear regression analysis, we found that various correlates of recombination rate persist throughout the African great apes including repeats, diversity, and divergence. Our study is the first to analyze within- and between-species genome-wide recombination rate variation in several close relatives.
Collapse
Affiliation(s)
- Laurie S Stevison
- Institute for Human Genetics, University of California San Francisco Department of Biological Sciences, Auburn University
| | - August E Woerner
- Arizona Research Laboratories, Division of Biotechnology, University of Arizona Department of Genetics, University of Arizona
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Department of Computational Medicine & Bioinformatics, University of Michigan
| | - Joanna L Kelley
- School of Biological Sciences, Washington State University Department of Genetics, Stanford University
| | - Krishna R Veeramah
- Arizona Research Laboratories, Division of Biotechnology, University of Arizona Department of Ecology and Evolution, Stony Brook University
| | - Kimberly F McManus
- Department of Biology, Stanford University Department of Biomedical Informatics, Stanford University
| | | | - Michael F Hammer
- Arizona Research Laboratories, Division of Biotechnology, University of Arizona Department of Ecology and Evolutionary Biology, University of Arizona Department of Anthropology, University of Arizona
| | - Jeffrey D Wall
- Institute for Human Genetics, University of California San Francisco Department of Epidemiology & Biostatistics, University of California San Francisco
| |
Collapse
|
18
|
Martin G, Baurens FC, Droc G, Rouard M, Cenci A, Kilian A, Hastie A, Doležel J, Aury JM, Alberti A, Carreel F, D'Hont A. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics 2016; 17:243. [PMID: 26984673 PMCID: PMC4793746 DOI: 10.1186/s12864-016-2579-4] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 03/08/2016] [Indexed: 12/04/2022] Open
Abstract
Background Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). Results We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %. Conclusion The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guillaume Martin
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Franc-Christophe Baurens
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Gaëtan Droc
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Mathieu Rouard
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France
| | - Alberto Cenci
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France
| | - Andrzej Kilian
- Diversity Arrays Technology, Yarralumla, Australian Capital Territory, 2600, Australia
| | - Alex Hastie
- BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA, 92121, USA
| | - Jaroslav Doležel
- Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371, Olomouc, Czech Republic
| | - Jean-Marc Aury
- Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France
| | - Adriana Alberti
- Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France
| | - Françoise Carreel
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France
| | - Angélique D'Hont
- CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
| |
Collapse
|
19
|
Application of Population Sequencing (POPSEQ) for Ordering and Imputing Genotyping-by-Sequencing Markers in Hexaploid Wheat. G3-GENES GENOMES GENETICS 2015; 5:2547-53. [PMID: 26530417 PMCID: PMC4683627 DOI: 10.1534/g3.115.020362] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The advancement of next-generation sequencing technologies in conjunction with new bioinformatics tools enabled fine-tuning of sequence-based, high-resolution mapping strategies for complex genomes. Although genotyping-by-sequencing (GBS) provides a large number of markers, its application for association mapping and genomics-assisted breeding is limited by a large proportion of missing data per marker. For species with a reference genomic sequence, markers can be ordered on the physical map. However, in the absence of reference marker order, the use and imputation of GBS markers is challenging. Here, we demonstrate how the population sequencing (POPSEQ) approach can be used to provide marker context for GBS in wheat. The utility of a POPSEQ-based genetic map as a reference map to create genetically ordered markers on a chromosome for hexaploid wheat was validated by constructing an independent de novo linkage map of GBS markers from a Synthetic W7984 × Opata M85 recombinant inbred line (SynOpRIL) population. The results indicated that there is strong agreement between the independent de novo linkage map and the POPSEQ mapping approach in mapping and ordering GBS markers for hexaploid wheat. After ordering, a large number of GBS markers were imputed, thus providing a high-quality reference map that can be used for QTL mapping for different traits. The POPSEQ-based reference map and whole-genome sequence assemblies are valuable resources that can be used to order GBS markers and enable the application of highly accurate imputation methods to leverage the application GBS markers in wheat.
Collapse
|
20
|
A Male-Specific Genetic Map of the Microcrustacean Daphnia pulex Based on Single-Sperm Whole-Genome Sequencing. Genetics 2015; 201:31-8. [PMID: 26116153 DOI: 10.1534/genetics.115.179028] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/24/2015] [Indexed: 12/12/2022] Open
Abstract
Genetic linkage maps are critical for assembling draft genomes to a meaningful chromosome level and for deciphering the genomic underpinnings of biological traits. The estimates of recombination rates derived from genetic maps also play an important role in understanding multiple aspects of genomic evolution such as nucleotide substitution patterns and accumulation of deleterious mutations. In this study, we developed a high-throughput experimental approach that combines fluorescence-activated cell sorting, whole-genome amplification, and short-read sequencing to construct a genetic map using single-sperm cells. Furthermore, a computational algorithm was developed to analyze single-sperm whole-genome sequencing data for map construction. These methods allowed us to rapidly build a male-specific genetic map for the freshwater microcrustacean Daphnia pulex, which shows significant improvements compared to a previous map. With a total of mapped 1672 haplotype blocks and an average intermarker distance of 0.87 cM, this map spans a total genetic distance of 1451 Kosambi cM and comprises 90% of the resolved regions in the current Daphnia reference assembly. The map also reveals the mistaken mapping of seven scaffolds in the reference assembly onto chromosome II by a previous microsatellite map based on F2 crosses. Our approach can be easily applied to many other organisms and holds great promise for unveiling the intragenomic and intraspecific variation in the recombination rates.
Collapse
|
21
|
Fierst JL. Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools. Front Genet 2015; 6:220. [PMID: 26150829 PMCID: PMC4473057 DOI: 10.3389/fgene.2015.00220] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 06/08/2015] [Indexed: 01/05/2023] Open
Abstract
Modern high-throughput DNA sequencing has made it possible to inexpensively produce genome sequences, but in practice many of these draft genomes are fragmented and incomplete. Genetic linkage maps based on recombination rates between physical markers have been used in biology for over 100 years and a linkage map, when paired with a de novo sequencing project, can resolve mis-assemblies and anchor chromosome-scale sequences. Here, I summarize the methodology behind integrating de novo assemblies and genetic linkage maps, outline the current challenges, review the available software tools, and discuss new mapping technologies.
Collapse
Affiliation(s)
- Janna L. Fierst
- Department of Biological Sciences, University of AlabamaTuscaloosa, AL, USA
| |
Collapse
|
22
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015. [PMID: 25637298 DOI: 10.1186/s13059‐015‐0582‐8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
23
|
Chapman JA, Mascher M, Buluç A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, Schmutz J, Yelick KA, Scholz U, Waugh R, Poland JA, Muehlbauer GJ, Stein N, Rokhsar DS. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 2015; 16:26. [PMID: 25637298 PMCID: PMC4373400 DOI: 10.1186/s13059-015-0582-8] [Citation(s) in RCA: 162] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 01/06/2015] [Indexed: 11/10/2022] Open
Abstract
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Collapse
Affiliation(s)
- Jarrod A Chapman
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Aydın Buluç
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Kerrie Barry
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| | - Evangelos Georganas
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Adam Session
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| | - Veronika Strnadova
- Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA.
| | - Jerry Jenkins
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Sunish Sehgal
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA. .,Present address: Department of Plant Science, South Dakota State University, Brookings, SD, 57007, USA.
| | - Leonid Oliker
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Jeremy Schmutz
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,HudsonAlpha Institute of Biotechnology, Huntsville, AL, 35806, USA.
| | - Katherine A Yelick
- Computational Research Division and National Energy Research Supercomputing Center (NERSC), Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,Department of Electrical Engineering and Computer Science, Computer Science Division, University of California, Berkeley, CA, 94720, USA.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Robbie Waugh
- Division of Plant Sciences, University of Dundee & The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, UK.
| | - Jesse A Poland
- Department of Plant Pathology, Kansas State University, Manhattan, KS, 65506, USA.
| | - Gary J Muehlbauer
- Departments of Agronomy and Plant Genetics, and Plant Biology, University of Minnesota, St Paul, MN, 55108, USA.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany.
| | - Daniel S Rokhsar
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
24
|
Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput Biol 2014; 10:e1003998. [PMID: 25474019 PMCID: PMC4256071 DOI: 10.1371/journal.pcbi.1003998] [Citation(s) in RCA: 172] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 10/22/2014] [Indexed: 11/19/2022] Open
Abstract
Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. The initial publication of the genome sequence of many plants, animals, and microbes is often accompanied with great fanfare. However, these genomes are almost always first-drafts, with a lot of missing data, many gaps, and many errors in the published sequences. Compounding this problem, the genes identified in draft genome sequences are also affected by incomplete genome assemblies: the number and exact structure of predicted genes may be incorrect. Here we quantify the extent of such errors, by comparing several draft genomes against completed versions of the same sequences. Surprisingly, we find huge numbers of errors in the number of genes predicted from draft assemblies, with more than half of all genes having the wrong number of copies in the draft genomes examined. Our investigation also reveals the major causes of these errors, and further analyses using additional functional data demonstrate that many of the gene predictions can be corrected. The results presented here suggest that many inferences based on published draft genomes may be erroneous, but offer a way forward for future analyses.
Collapse
|
25
|
Sessa EB, Banks JA, Barker MS, Der JP, Duffy AM, Graham SW, Hasebe M, Langdale J, Li FW, Marchant DB, Pryer KM, Rothfels CJ, Roux SJ, Salmi ML, Sigel EM, Soltis DE, Soltis PS, Stevenson DW, Wolf PG. Between two fern genomes. Gigascience 2014; 3:15. [PMID: 25324969 PMCID: PMC4199785 DOI: 10.1186/2047-217x-3-15] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 09/18/2014] [Indexed: 11/10/2022] Open
Abstract
Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves.
Collapse
Affiliation(s)
- Emily B Sessa
- Department of Biology, Box 118525, University of Florida, Gainesville, FL 32611, USA ; Genetics Institute, University of Florida, Box 103610, Gainesville, FL 32611, USA
| | - Jo Ann Banks
- Department of Botany and Plant Pathology, Purdue University, 915 West State Street, West Lafayette, IN 47907, USA
| | - Michael S Barker
- Department of Ecology & Evolutionary Biology, University of Arizona, 1041 East Lowell Street, Tucson, AZ 85721, USA
| | - Joshua P Der
- Department of Biology, Penn State University, 201 Life Science Building, University Park, PA 16801, USA ; Current address: Department of Biological Science, California State University, 800 N. State College Blvd., Fullerton, CA 92831, USA
| | - Aaron M Duffy
- Ecology Center and Department of Biology, Utah State University, 5305 Old Main Hill, Logan, UT 84322, USA
| | - Sean W Graham
- Department of Botany, University of British Columbia, 3529-6720 University Blvd., Vancouver, BC V6T 1Z4, Canada
| | - Mitsuyasu Hasebe
- National Institute for Basic Biology, 38 Nishigounaka, Myo-daiji-cho, Okazaki 444-8585, Japan
| | - Jane Langdale
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, UK
| | - Fay-Wei Li
- Department of Biology, Duke University, Post Office Box 90338, Durham, NC 27708, USA
| | - D Blaine Marchant
- Department of Biology, Box 118525, University of Florida, Gainesville, FL 32611, USA ; Florida Museum of Natural History, Dickinson Hall, University of Florida, Gainesville, FL 32611, USA
| | - Kathleen M Pryer
- Department of Biology, Duke University, Post Office Box 90338, Durham, NC 27708, USA
| | - Carl J Rothfels
- Department of Zoology, University of British Columbia, 2329 W. Mall, WAITING Vancouver, BC V6T 1Z4, Canada ; Current address: University Herbarium and Department of Integrative Biology, University of California, 1001 Valley Life Sciences Building, Berkeley, Berkeley, CA 94720, USA
| | - Stanley J Roux
- Department of Molecular Biosciences, University of Texas, 205 W. 24th Street, Austin, TX 78712, USA
| | - Mari L Salmi
- Department of Molecular Biosciences, University of Texas, 205 W. 24th Street, Austin, TX 78712, USA
| | - Erin M Sigel
- Department of Biology, Duke University, Post Office Box 90338, Durham, NC 27708, USA
| | - Douglas E Soltis
- Department of Biology, Box 118525, University of Florida, Gainesville, FL 32611, USA ; Genetics Institute, University of Florida, Box 103610, Gainesville, FL 32611, USA ; Florida Museum of Natural History, Dickinson Hall, University of Florida, Gainesville, FL 32611, USA
| | - Pamela S Soltis
- Genetics Institute, University of Florida, Box 103610, Gainesville, FL 32611, USA ; Florida Museum of Natural History, Dickinson Hall, University of Florida, Gainesville, FL 32611, USA
| | - Dennis W Stevenson
- New York Botanical Garden, 2900 Southern Boulevard, Bronx, NY 10458, USA
| | - Paul G Wolf
- Ecology Center and Department of Biology, Utah State University, 5305 Old Main Hill, Logan, UT 84322, USA
| |
Collapse
|
26
|
High-resolution genetic map for understanding the effect of genome-wide recombination rate on nucleotide diversity in watermelon. G3-GENES GENOMES GENETICS 2014; 4:2219-30. [PMID: 25227227 PMCID: PMC4232547 DOI: 10.1534/g3.114.012815] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We used genotyping by sequencing to identify a set of 10,480 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1096 cM for watermelon. We assessed the genome-wide variation in recombination rate (GWRR) across the map and found an association between GWRR and genome-wide nucleotide diversity. Collinearity between the map and the genome-wide reference sequence for watermelon was studied to identify inconsistency and chromosome rearrangements. We assessed genome-wide nucleotide diversity, linkage disequilibrium (LD), and selective sweep for wild, semi-wild, and domesticated accessions of Citrullus lanatus var. lanatus to track signals of domestication. Principal component analysis combined with chromosome-wide phylogenetic study based on 1563 SNPs obtained after LD pruning with minor allele frequency of 0.05 resolved the differences between semi-wild and wild accessions as well as relationships among worldwide sweet watermelon. Population structure analysis revealed predominant ancestries for wild, semi-wild, and domesticated watermelons as well as admixture of various ancestries that were important for domestication. Sliding window analysis of Tajima’s D across various chromosomes was used to resolve selective sweep. LD decay was estimated for various chromosomes. We identified a strong selective sweep on chromosome 3 consisting of important genes that might have had a role in sweet watermelon domestication.
Collapse
|
27
|
Pernaci M, De Mita S, Andrieux A, Pétrowski J, Halkett F, Duplessis S, Frey P. Genome-wide patterns of segregation and linkage disequilibrium: the construction of a linkage genetic map of the poplar rust fungus Melampsora larici-populina. FRONTIERS IN PLANT SCIENCE 2014; 5:454. [PMID: 25309554 PMCID: PMC4159982 DOI: 10.3389/fpls.2014.00454] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 08/21/2014] [Indexed: 05/16/2023]
Abstract
The poplar rust fungus Melampsora larici-populina causes significant yield reduction and severe economic losses in commercial poplar plantations. After several decades of breeding for qualitative resistance and subsequent breakdown of the released resistance genes, breeders now focus on quantitative resistance, perceived to be more durable. But quantitative resistance also can be challenged by an increase of aggressiveness in the pathogen. Thus, it is of primary importance to better understand the genetic architecture of aggressiveness traits. To this aim, our goal is to build a genetic linkage map for M. larici-populina in order to map quantitative trait loci related to aggressiveness. First, a large progeny of M. larici-populina was generated through selfing of the reference strain 98AG31 (which genome sequence is available) on larch plants, the alternate host of the poplar rust fungus. The progeny's meiotic origin was validated through a segregation analysis of 115 offspring with 14 polymorphic microsatellite markers, of which 12 segregated in the expected 1:2:1 Mendelian ratio. A microsatellite-based linkage disequilibrium analysis allowed us to identify one potential linkage group comprising two scaffolds. The whole genome of a subset of 47 offspring was resequenced using the Illumina HiSeq 2000 technology at a mean sequencing depth of 6X. The reads were mapped onto the reference genome of the parental strain and 144,566 SNPs were identified across the genome. Analysis of distribution and polymorphism of the SNPs along the genome led to the identification of 2580 recombination blocks. A second linkage disequilibrium analysis, using the recombination blocks as markers, allowed us to group 81 scaffolds into 23 potential linkage groups. These preliminary results showed that a high-density linkage map could be constructed by using high-quality SNPs based on low-coverage resequencing of a larger number of M. larici-populina offspring.
Collapse
Affiliation(s)
- Michaël Pernaci
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| | - Stéphane De Mita
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| | - Axelle Andrieux
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| | - Jérémy Pétrowski
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| | - Fabien Halkett
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| | - Sébastien Duplessis
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| | - Pascal Frey
- Interactions Arbres - Micro organismes, Institut national de la recherche agronomique, UMR1136Champenoux, France
- Interactions Arbres - Micro organismes, Université de Lorraine, UMR1136Vandoeuvre-lès-Nancy, France
| |
Collapse
|
28
|
Jiang Y, Xu P, Liu Z. Generation of physical map contig-specific sequences. Front Genet 2014; 5:243. [PMID: 25101119 PMCID: PMC4105628 DOI: 10.3389/fgene.2014.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2014] [Accepted: 07/07/2014] [Indexed: 12/13/2022] Open
Abstract
Rapid advances of the next-generation sequencing technologies have allowed whole genome sequencing of many species. However, with the current sequencing technologies, the whole genome sequence assemblies often fall in short in one of the four quality measurements: accuracy, contiguity, connectivity, and completeness. In particular, small-sized contigs and scaffolds limit the applicability of whole genome sequences for genetic analysis. To enhance the quality of whole genome sequence assemblies, particularly the scaffolding capabilities, additional genomic resources are required. Among these, sequences derived from known physical locations offer great powers for scaffolding. In this mini-review, we will describe the principles, procedures and applications of physical-map-derived sequences, with the focus on physical map contig-specific sequences.
Collapse
Affiliation(s)
- Yanliang Jiang
- Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences Beijing, China
| | - Peng Xu
- Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences Beijing, China
| | - Zhanjiang Liu
- Aquatic Genomics Unit, The Fish Molecular Genetics and Biotechnology Laboratory, School of Fisheries, Aquaculture and Aquatic Sciences, and Program of Cell and Molecular Biosciences, Auburn University AL, USA
| |
Collapse
|
29
|
Mascher M, Stein N. Genetic anchoring of whole-genome shotgun assemblies. Front Genet 2014; 5:208. [PMID: 25071835 PMCID: PMC4083584 DOI: 10.3389/fgene.2014.00208] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 06/19/2014] [Indexed: 12/30/2022] Open
Abstract
The recent advances in sequencing throughput and genome assembly algorithms have established whole-genome shotgun (WGS) assemblies as the cornerstone of the genomic infrastructure for many species. WGS assemblies can be constructed with comparative ease and give a comprehensive representation of the gene space even of large and complex genomes. One major obstacle in utilizing WGS assemblies for important research applications such as gene isolation or comparative genomics has been the lack of chromosomal positioning and contextualization of short sequence contigs. Assigning chromosomal locations to sequence contigs required the construction and integration of genome-wide physical maps and dense genetic linkage maps as well as synteny to model species. Recently, methods to rapidly construct ultra-dense linkage maps encompassing millions of genetic markers from WGS sequencing data of segregating populations have made possible the direct assignment of genetic positions to short sequence contigs. Here, we review recent developments in the integration of WGS assemblies and sequence-based linkage maps, discuss challenges for further improvement of the methodology and outline possible applications building on genetically anchored WGS assemblies.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research, Stadt Seeland Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research, Stadt Seeland Germany
| |
Collapse
|