1
|
RAD Capture (Rapture): Flexible and Efficient Sequence-Based Genotyping. Genetics 2015; 202:389-400. [PMID: 26715661 DOI: 10.1534/genetics.115.183665] [Citation(s) in RCA: 264] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 12/17/2015] [Indexed: 12/19/2022] Open
Abstract
Massively parallel sequencing has revolutionized many areas of biology, but sequencing large amounts of DNA in many individuals is cost-prohibitive and unnecessary for many studies. Genomic complexity reduction techniques such as sequence capture and restriction enzyme-based methods enable the analysis of many more individuals per unit cost. Despite their utility, current complexity reduction methods have limitations, especially when large numbers of individuals are analyzed. Here we develop a much improved restriction site-associated DNA (RAD) sequencing protocol and a new method called Rapture ( R: AD c APTURE: ). The new RAD protocol improves versatility by separating RAD tag isolation and sequencing library preparation into two distinct steps. This protocol also recovers more unique (nonclonal) RAD fragments, which improves both standard RAD and Rapture analysis. Rapture then uses an in-solution capture of chosen RAD tags to target sequencing reads to desired loci. Rapture combines the benefits of both RAD and sequence capture, i.e., very inexpensive and rapid library preparation for many individuals as well as high specificity in the number and location of genomic loci analyzed. Our results demonstrate that Rapture is a rapid and flexible technology capable of analyzing a very large number of individuals with minimal sequencing and library preparation cost. The methods presented here should improve the efficiency of genetic analysis for many aspects of agricultural, environmental, and biomedical science.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
10 |
264 |
2
|
Johnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A, Eiserhardt WL, Epitawalage N, Forest F, Kim JT, Leebens-Mack JH, Leitch IJ, Maurin O, Soltis DE, Soltis PS, Wong GKS, Baker WJ, Wickett NJ. A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering. Syst Biol 2019; 68:594-606. [PMID: 30535394 PMCID: PMC6568016 DOI: 10.1093/sysbio/syy086] [Citation(s) in RCA: 251] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 11/29/2018] [Accepted: 12/03/2018] [Indexed: 01/31/2023] Open
Abstract
Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.
Collapse
|
research-article |
6 |
251 |
3
|
Van de Weyer AL, Monteiro F, Furzer OJ, Nishimura MT, Cevik V, Witek K, Jones JDG, Dangl JL, Weigel D, Bemm F. A Species-Wide Inventory of NLR Genes and Alleles in Arabidopsis thaliana. Cell 2019; 178:1260-1272.e14. [PMID: 31442410 PMCID: PMC6709784 DOI: 10.1016/j.cell.2019.07.038] [Citation(s) in RCA: 221] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 06/13/2019] [Accepted: 07/19/2019] [Indexed: 12/18/2022]
Abstract
Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes.
Collapse
|
research-article |
6 |
221 |
4
|
McKain MR, Johnson MG, Uribe‐Convers S, Eaton D, Yang Y. Practical considerations for plant phylogenomics. APPLICATIONS IN PLANT SCIENCES 2018; 6:e1038. [PMID: 29732268 PMCID: PMC5895195 DOI: 10.1002/aps3.1038] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 03/13/2018] [Indexed: 05/10/2023]
Abstract
The past decade has seen a major breakthrough in our ability to easily and inexpensively sequence genome-scale data from diverse lineages. The development of high-throughput sequencing and long-read technologies has ushered in the era of phylogenomics, where hundreds to thousands of nuclear genes and whole organellar genomes are routinely used to reconstruct evolutionary relationships. As a result, understanding which options are best suited for a particular set of questions can be difficult, especially for those just starting in the field. Here, we review the most recent advances in plant phylogenomic methods and make recommendations for project-dependent best practices and considerations. We focus on the costs and benefits of different approaches in regard to the information they provide researchers and the questions they can address. We also highlight unique challenges and opportunities in plant systems, such as polyploidy, reticulate evolution, and the use of herbarium materials, identifying optimal methodologies for each. Finally, we draw attention to lingering challenges in the field of plant phylogenomics, such as reusability of data sets, and look at some up-and-coming technologies that may help propel the field even further.
Collapse
|
Review |
7 |
101 |
5
|
Shearer AE, Black-Ziegelbein EA, Hildebrand MS, Eppsteiner RW, Ravi H, Joshi S, Guiffre AC, Sloan CM, Happe S, Howard SD, Novak B, Deluca AP, Taylor KR, Scheetz TE, Braun TA, Casavant TL, Kimberling WJ, Leproust EM, Smith RJH. Advancing genetic testing for deafness with genomic technology. J Med Genet 2013; 50:627-34. [PMID: 23804846 DOI: 10.1136/jmedgenet-2013-101749] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
BACKGROUND Non-syndromic hearing loss (NSHL) is the most common sensory impairment in humans. Until recently its extreme genetic heterogeneity precluded comprehensive genetic testing. Using a platform that couples targeted genomic enrichment (TGE) and massively parallel sequencing (MPS) to sequence all exons of all genes implicated in NSHL, we tested 100 persons with presumed genetic NSHL and in so doing established sequencing requirements for maximum sensitivity and defined MPS quality score metrics that obviate Sanger validation of variants. METHODS We examined DNA from 100 sequentially collected probands with presumed genetic NSHL without exclusions due to inheritance, previous genetic testing, or type of hearing loss. We performed TGE using post-capture multiplexing in variable pool sizes followed by Illumina sequencing. We developed a local Galaxy installation on a high performance computing cluster for bioinformatics analysis. RESULTS To obtain maximum variant sensitivity with this platform 3.2-6.3 million total mapped sequencing reads per sample were required. Quality score analysis showed that Sanger validation was not required for 95% of variants. Our overall diagnostic rate was 42%, but this varied by clinical features from 0% for persons with asymmetric hearing loss to 56% for persons with bilateral autosomal recessive NSHL. CONCLUSIONS These findings will direct the use of TGE and MPS strategies for genetic diagnosis for NSHL. Our diagnostic rate highlights the need for further research on genetic deafness focused on novel gene identification and an improved understanding of the role of non-exonic mutations. The unsolved families we have identified provide a valuable resource to address these areas.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
91 |
6
|
Breinholt JW, Earl C, Lemmon AR, Lemmon EM, Xiao L, Kawahara AY. Resolving Relationships among the Megadiverse Butterflies and Moths with a Novel Pipeline for Anchored Phylogenomics. Syst Biol 2018; 67:78-93. [PMID: 28472519 DOI: 10.1093/sysbio/syx048] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 04/28/2017] [Indexed: 11/12/2022] Open
Abstract
The advent of next-generation sequencing technology has allowed for thecollection of large portions of the genome for phylogenetic analysis. Hybrid enrichment and transcriptomics are two techniques that leverage next-generation sequencing and have shown much promise. However, methods for processing hybrid enrichment data are still limited. We developed a pipeline for anchored hybrid enrichment (AHE) read assembly, orthology determination, contamination screening, and data processing for sequences flanking the target "probe" region. We apply this approach to study the phylogeny of butterflies and moths (Lepidoptera), a megadiverse group of more than 157,000 described species with poorly understood deep-level phylogenetic relationships. We introduce a new, 855 locus AHE kit for Lepidoptera phylogenetics and compare resulting trees to those from transcriptomes. The enrichment kit was designed from existing genomes, transcriptomes, and expressed sequence tags and was used to capture sequence data from 54 species from 23 lepidopteran families. Phylogenies estimated from AHE data were largely congruent with trees generated from transcriptomes, with strong support for relationships at all but the deepest taxonomic levels. We combine AHE and transcriptomic data to generate a new Lepidoptera phylogeny, representing 76 exemplar species in 42 families. The tree provides robust support for many relationships, including those among the seven butterfly families. The addition of AHE data to an existing transcriptomic dataset lowers node support along the Lepidoptera backbone, but firmly places taxa with AHE data on the phylogeny. Combining taxa sequenced for AHE with existing transcriptomes and genomes resulted in a tree with strong support for (Calliduloidea $+$ Gelechioidea $+$ Thyridoidea) $+$ (Papilionoidea $+$ Pyraloidea $+$ Macroheterocera). To examine the efficacy of AHE at a shallow taxonomic level, phylogenetic analyses were also conducted on a sister group representing a more recent divergence, the Saturniidae and Sphingidae. These analyses utilized sequences from the probe region and data flanking it, nearly doubled the size of the dataset; resulting trees supported new phylogenetics relationships, especially within the Saturniidae and Sphingidae (e.g., Hemarina derived in the latter). We hope that our data processing pipeline, hybrid enrichment gene set, and approach of combining AHE data with transcriptomes will be useful for the broader systematics community.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
7 |
76 |
7
|
Linkem CW, Minin VN, Leaché AD. Detecting the Anomaly Zone in Species Trees and Evidence for a Misleading Signal in Higher-Level Skink Phylogeny (Squamata: Scincidae). Syst Biol 2016; 65:465-77. [PMID: 26738927 PMCID: PMC6383586 DOI: 10.1093/sysbio/syw001] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 12/29/2015] [Indexed: 01/28/2023] Open
Abstract
The anomaly zone, defined by the presence of gene tree topologies that are more probable than the true species tree, presents a major challenge to the accurate resolution of many parts of the Tree of Life. This discrepancy can result from consecutive rapid speciation events in the species tree. Similar to the problem of long-branch attraction, including more data via loci concatenation will only reinforce the support for the incorrect species tree. Empirical phylogenetic studies often employ coalescent-based species tree methods to avoid the anomaly zone, but to this point these studies have not had a method for providing any direct evidence that the species tree is actually in the anomaly zone. In this study, we use 16 species of lizards in the family Scincidae to investigate whether nodes that are difficult to resolve place the species tree within the anomaly zone. We analyze new phylogenomic data (429 loci), using both concatenation and coalescent-based species tree estimation, to locate conflicting topological signal. We then use the unifying principle of the anomaly zone, together with estimates of ancestral population sizes and species persistence times, to determine whether the observed phylogenetic conflict is a result of the anomaly zone. We identify at least three regions of the Scincidae phylogeny that provide demographic signatures consistent with the anomaly zone, and this new information helps reconcile the phylogenetic conflict in previously published studies on these lizards. The anomaly zone presents a real problem in phylogenetics, and our new framework for identifying anomalous relationships will help empiricists leverage their resources appropriately for investigating and overcoming this challenge.
Collapse
|
research-article |
9 |
67 |
8
|
Liquori A, Vaché C, Baux D, Blanchet C, Hamel C, Malcolm S, Koenig M, Claustres M, Roux AF. Whole USH2A Gene Sequencing Identifies Several New Deep Intronic Mutations. Hum Mutat 2015; 37:184-93. [PMID: 26629787 DOI: 10.1002/humu.22926] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 10/19/2015] [Indexed: 01/01/2023]
Abstract
Deep intronic mutations leading to pseudoexon (PE) insertions are underestimated and most of these splicing alterations have been identified by transcript analysis, for instance, the first deep intronic mutation in USH2A, the gene most frequently involved in Usher syndrome type II (USH2). Unfortunately, analyzing USH2A transcripts is challenging and for 1.8%-19% of USH2 individuals carrying a single USH2A recessive mutation, a second mutation is yet to be identified. We have developed and validated a DNA next-generation sequencing approach to identify deep intronic variants in USH2A and evaluated their consequences on splicing. Three distinct novel deep intronic mutations have been identified. All were predicted to affect splicing and resulted in the insertion of PEs, as shown by minigene assays. We present a new and attractive strategy to identify deep intronic mutations, when RNA analyses are not possible. Moreover, the bioinformatics pipeline developed is independent of the gene size, implying the possible application of this approach to any disease-linked gene. Finally, an antisense morpholino oligonucleotide tested in vitro for its ability to restore splicing caused by the c.9959-4159A>G mutation provided high inhibition rates, which are indicative of its potential for molecular therapy.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
66 |
9
|
Meek MH, Larson WA. The future is now: Amplicon sequencing and sequence capture usher in the conservation genomics era. Mol Ecol Resour 2019; 19:795-803. [PMID: 30681776 DOI: 10.1111/1755-0998.12998] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 01/17/2019] [Accepted: 01/18/2019] [Indexed: 01/21/2023]
Abstract
The genomics revolution has initiated a new era of population genetics where genome-wide data are frequently used to understand complex patterns of population structure and selection. However, the application of genomic tools to inform management and conservation has been somewhat rare outside a few well studied species. Fortunately, two recently developed approaches, amplicon sequencing and sequence capture, have the potential to significantly advance the field of conservation genomics. Here, amplicon sequencing refers to highly multiplexed PCR followed by high-throughput sequencing (e.g., GTseq), and sequence capture refers to using capture probes to isolate loci from reduced-representation libraries (e.g., Rapture). Both approaches allow sequencing of thousands of individuals at relatively low costs, do not require any specialized equipment for library preparation, and generate data that can be analyzed without sophisticated computational infrastructure. Here, we discuss the advantages and disadvantages of each method and provide a decision framework for geneticists who are looking to integrate these methods into their research programme. While it will always be important to consider the specifics of the biological question and system, we believe that amplicon sequencing is best suited for projects aiming to genotype <500 loci on many individuals (>1,500) or for species where continued monitoring is anticipated (e.g., long-term pedigrees). Sequence capture, on the other hand, is best applied to projects including fewer individuals or where >500 loci are required. Both of these techniques should smooth the transition from traditional genetic techniques to genomics, helping to usher in the conservation genomics era.
Collapse
|
News |
6 |
63 |
10
|
Dodsworth S, Pokorny L, Johnson MG, Kim JT, Maurin O, Wickett NJ, Forest F, Baker WJ. Hyb-Seq for Flowering Plant Systematics. TRENDS IN PLANT SCIENCE 2019; 24:887-891. [PMID: 31477409 DOI: 10.1016/j.tplants.2019.07.011] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 07/15/2019] [Accepted: 07/31/2019] [Indexed: 05/21/2023]
Abstract
High-throughput DNA sequencing (HTS) presents great opportunities for plant systematics, yet genomic complexity needs to be reduced for HTS to be effectively applied. We highlight Hyb-Seq as a promising approach, especially in light of the recent development of probes enriching 353 low-copy nuclear genes from any flowering plant taxon.
Collapse
|
|
6 |
62 |
11
|
Schiessl S, Samans B, Hüttel B, Reinhard R, Snowdon RJ. Capturing sequence variation among flowering-time regulatory gene homologs in the allopolyploid crop species Brassica napus. FRONTIERS IN PLANT SCIENCE 2014; 5:404. [PMID: 25202314 PMCID: PMC4142343 DOI: 10.3389/fpls.2014.00404] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2014] [Accepted: 07/29/2014] [Indexed: 05/18/2023]
Abstract
Flowering, the transition from the vegetative to the generative phase, is a decisive time point in the lifecycle of a plant. Flowering is controlled by a complex network of transcription factors, photoreceptors, enzymes and miRNAs. In recent years, several studies gave rise to the hypothesis that this network is also strongly involved in the regulation of other important lifecycle processes ranging from germination and seed development through to fundamental developmental and yield-related traits. In the allopolyploid crop species Brassica napus, (genome AACC), homoeologous copies of flowering time regulatory genes are implicated in major phenological variation within the species, however the extent and control of intraspecific and intergenomic variation among flowering-time regulators is still unclear. To investigate differences among B. napus morphotypes in relation to flowering-time gene variation, we performed targeted deep sequencing of 29 regulatory flowering-time genes in four genetically and phenologically diverse B. napus accessions. The genotype panel included a winter-type oilseed rape, a winter fodder rape, a spring-type oilseed rape (all B. napus ssp. napus) and a swede (B. napus ssp. napobrassica), which show extreme differences in winter-hardiness, vernalization requirement and flowering behavior. A broad range of genetic variation was detected in the targeted genes for the different morphotypes, including non-synonymous SNPs, copy number variation and presence-absence variation. The results suggest that this broad variation in vernalization, clock and signaling genes could be a key driver of morphological differentiation for flowering-related traits in this recent allopolyploid crop species.
Collapse
|
research-article |
11 |
59 |
12
|
Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW. Evidence for extensive parallelism but divergent genomic architecture of adaptation along altitudinal and latitudinal gradients in Populus trichocarpa. THE NEW PHYTOLOGIST 2016; 209:1240-51. [PMID: 26372471 DOI: 10.1111/nph.13643] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 08/13/2015] [Indexed: 05/10/2023]
Abstract
Adaptation to climate across latitude and altitude reflects shared climatic constraints, which may lead to parallel adaptation. However, theory predicts that higher gene flow should favor more concentrated genomic architectures, which would lead to fewer locally maladapted recombinants. We used exome capture to resequence the gene space along a latitudinal and two altitudinal transects in the model tree Populus trichocapra. Adaptive trait phenotyping was coupled with FST outlier tests and sliding window analysis to assess the degree of parallel adaptation as well as the genomic distribution of outlier loci. Up to 51% of outlier loci overlapped between transect pairs and up to 15% of these loci overlapped among all three transects. Genomic clustering of adaptive loci was more pronounced for altitudinal than latitudinal transects. In both altitudinal transects, there was a larger number of these 'islands of divergence', which were on average longer and included several of exceptional physical length. Our results suggest that recapitulation of genetic clines over latitude and altitude involves extensive parallelism, but that steep altitudinal clines generate islands of divergence. This suggests that physical proximity of genes in coadapted complexes may buffer against the movement of maladapted alleles from geographically proximal but climatically distinct populations.
Collapse
|
|
9 |
52 |
13
|
Heritable Epigenomic Changes to the Maize Methylome Resulting from Tissue Culture. Genetics 2018; 209:983-995. [PMID: 29848487 DOI: 10.1534/genetics.118.300987] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 05/26/2018] [Indexed: 12/22/2022] Open
Abstract
DNA methylation can contribute to the maintenance of genome integrity and regulation of gene expression. In most situations, DNA methylation patterns are inherited quite stably. However, changes in DNA methylation can occur at some loci as a result of tissue culture resulting in somaclonal variation. To investigate heritable epigenetic changes as a consequence of tissue culture, a sequence-capture bisulfite sequencing approach was implemented to monitor context-specific DNA methylation patterns in ∼15 Mb of the maize genome for a population of plants that had been regenerated from tissue culture. Plants that have been regenerated from tissue culture exhibit gains and losses of DNA methylation at a subset of genomic regions. There was evidence for a high rate of homozygous changes to DNA methylation levels that occur consistently in multiple independent tissue culture lines, suggesting that some loci are either targeted or hotspots for epigenetic variation. The consistent changes inherited following tissue culture include both gains and losses of DNA methylation and can affect CG, CHG, or both contexts within a region. Only a subset of the tissue culture changes observed in callus plants are observed in the primary regenerants, but the majority of DNA methylation changes present in primary regenerants are passed onto offspring. This study provides insights into the susceptibility of some loci and potential mechanisms that could contribute to altered DNA methylation and epigenetic state that occur during tissue culture in plant species.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
7 |
42 |
14
|
Zhou L, Bawa R, Holliday JA. Exome resequencing reveals signatures of demographic and adaptive processes across the genome and range of black cottonwood (Populus trichocarpa). Mol Ecol 2014; 23:2486-99. [PMID: 24750333 DOI: 10.1111/mec.12752] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2013] [Revised: 04/09/2014] [Accepted: 04/11/2014] [Indexed: 12/11/2022]
Abstract
Extant variation in temperate and boreal plant species has been influenced by both demographic histories associated with Pleistocene glacial cycles and adaptation to local climate. We used sequence capture to investigate the role of these neutral and adaptive processes in shaping diversity in black cottonwood (Populus trichocarpa). Nucleotide diversity and Tajima's D were lowest at replacement sites and highest at intergenic sites, while LD showed the opposite pattern. With samples grouped into three populations arrayed latitudinally, effective population size was highest in the north, followed by south and centre, and LD was highest in the south followed by the north and centre, suggesting a possible northern glacial refuge. FST outlier analysis revealed that promoter, 5'-UTR and intronic sites were enriched for outliers compared with coding regions, while no outliers were found among intergenic sites. Codon usage bias was evident, and genes with synonymous outliers had 30% higher average expression compared with genes containing replacement outliers. These results suggest divergent selection related to regulation of gene expression is important to local adaptation in P. trichocarpa. Finally, within-population selective sweeps were much more pronounced in the central population than in putative northern and southern refugia, which may reflect the different demographic histories of the populations and concomitant effects on signatures of genetic hitchhiking from standing variation.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
11 |
39 |
15
|
Vatanparast M, Powell A, Doyle JJ, Egan AN. Targeting legume loci: A comparison of three methods for target enrichment bait design in Leguminosae phylogenomics. APPLICATIONS IN PLANT SCIENCES 2018; 6:e1036. [PMID: 29732266 PMCID: PMC5895186 DOI: 10.1002/aps3.1036] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 02/22/2018] [Indexed: 05/19/2023]
Abstract
PREMISE OF THE STUDY The development of pipelines for locus discovery has spurred the use of target enrichment for plant phylogenomics. However, few studies have compared pipelines from locus discovery and bait design, through validation, to tree inference. We compared three methods within Leguminosae (Fabaceae) and present a workflow for future efforts. METHODS Using 30 transcriptomes, we compared Hyb-Seq, MarkerMiner, and the Yang and Smith (Y&S) pipelines for locus discovery, validated 7501 baits targeting 507 loci across 25 genera via Illumina sequencing, and inferred gene and species trees via concatenation- and coalescent-based methods. RESULTS Hyb-Seq discovered loci with the longest mean length. MarkerMiner discovered the most conserved loci with the least flagged as paralogous. Y&S offered the most parsimony-informative sites and putative orthologs. Target recovery averaged 93% across taxa. We optimized our targeted locus set based on a workflow designed to minimize paralog/ortholog conflation and thus present 423 loci for legume phylogenomics. CONCLUSIONS Methods differed across criteria important for phylogenetic marker development. We recommend Hyb-Seq as a method that may be useful for most phylogenomic projects. Our targeted locus set is a resource for future, community-driven efforts to reconstruct the legume tree of life.
Collapse
|
research-article |
7 |
37 |
16
|
Viruel J, Conejero M, Hidalgo O, Pokorny L, Powell RF, Forest F, Kantar MB, Soto Gomez M, Graham SW, Gravendeel B, Wilkin P, Leitch IJ. A Target Capture-Based Method to Estimate Ploidy From Herbarium Specimens. FRONTIERS IN PLANT SCIENCE 2019; 10:937. [PMID: 31396248 PMCID: PMC6667659 DOI: 10.3389/fpls.2019.00937] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Accepted: 07/04/2019] [Indexed: 05/24/2023]
Abstract
Whole genome duplication (WGD) events are common in many plant lineages, but the ploidy status and possible occurrence of intraspecific ploidy variation are unknown for most species. Standard methods for ploidy determination are chromosome counting and flow cytometry approaches. While flow cytometry approaches typically use fresh tissue, an increasing number of studies have shown that recently dried specimens can be used to yield ploidy data. Recent studies have started to explore whether high-throughput sequencing (HTS) data can be used to assess ploidy levels by analyzing allelic frequencies from single copy nuclear genes. Here, we compare different approaches using a range of yam (Dioscorea) tissues of varying ages, drying methods and quality, including herbarium tissue. Our aims were to: (1) explore the limits of flow cytometry in estimating ploidy level from dried samples, including herbarium vouchers collected between 1831 and 2011, and (2) optimize a HTS-based method to estimate ploidy by considering allelic frequencies from nuclear genes obtained using a target-capture method. We show that, although flow cytometry can be used to estimate ploidy levels from herbarium specimens collected up to fifteen years ago, success rate is low (5.9%). We validated our HTS-based estimates of ploidy using 260 genes by benchmarking with dried samples of species of known ploidy (Dioscorea alata, D. communis, and D. sylvatica). Subsequently, we successfully applied the method to the 85 herbarium samples analyzed with flow cytometry, and successfully provided results for 91.7% of them, comprising species across the phylogenetic tree of Dioscorea. We also explored the limits of using this HTS-based approach for identifying high ploidy levels in herbarium material and the effects of heterozygosity and sequence coverage. Overall, we demonstrated that ploidy diversity within and between species may be ascertained from historical collections, allowing the determination of polyploidization events from samples collected up to two centuries ago. This approach has the potential to provide insights into the drivers and dynamics of ploidy level changes during plant evolution and crop domestication.
Collapse
|
research-article |
6 |
36 |
17
|
Hebert FO, Renaut S, Bernatchez L. Targeted sequence capture and resequencing implies a predominant role of regulatory regions in the divergence of a sympatric lake whitefish species pair (Coregonus clupeaformis). Mol Ecol 2013; 22:4896-914. [PMID: 23962219 DOI: 10.1111/mec.12447] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 07/03/2013] [Accepted: 07/08/2013] [Indexed: 12/18/2022]
Abstract
Latest technological developments in evolutionary biology bring new challenges in documenting the intricate genetic architecture of species in the process of divergence. Sympatric populations of lake whitefish represent one of the key systems to investigate this issue. Despite the value of random genotype-by-sequencing methods and decreasing cost of sequencing technologies, it remains challenging to investigate variation in coding regions, especially in the case of recently duplicated genomes as in salmonids, as this greatly complicates whole genome resequencing. We thus designed a sequence capture array targeting 2773 annotated genes to document the nature and the extent of genomic divergence between sympatric dwarf and normal whitefish. Among the 2728 genes successfully captured, a total of 2182 coding and 10,415 noncoding putative single-nucleotide polymorphisms (SNPs) were identified after applying a first set of basic filters. A genome scan with a quality-refined selection of 2203 SNPs identified 267 outlier SNPs in 210 candidate genes located in genomic regions potentially involved in whitefish divergence and reproductive isolation. We found highly heterogeneous FST estimates among SNP loci. There was an overall low level of coding polymorphism, with a predominance of noncoding mutations among outliers. The heterogeneous patterns of divergence among loci confirm the porous nature of genomes during speciation with gene flow. Considering that few protein-coding mutations were identified as highly divergent, our results, along with previous transcriptomic studies, imply that changes in regulatory regions most likely had a greater role in the process of whitefish population divergence than protein-coding mutations. This study is the first to demonstrate the efficiency of large-scale targeted resequencing for a nonmodel species with such a large and unsequenced genome.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
12 |
33 |
18
|
Baison J, Vidalis A, Zhou L, Chen Z, Li Z, Sillanpää MJ, Bernhardsson C, Scofield D, Forsberg N, Grahn T, Olsson L, Karlsson B, Wu H, Ingvarsson PK, Lundqvist S, Niittylä T, García‐Gil MR. Genome-wide association study identified novel candidate loci affecting wood formation in Norway spruce. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 100:83-100. [PMID: 31166032 PMCID: PMC6852177 DOI: 10.1111/tpj.14429] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 04/16/2019] [Accepted: 05/20/2019] [Indexed: 05/26/2023]
Abstract
Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional genome-wide association study (GWAS) of 17 wood traits in Norway spruce using 178 101 single nucleotide polymorphisms (SNPs) generated from exome genotyping of 517 mother trees. The wood traits were defined using functional modelling of wood properties across annual growth rings. We applied a Least Absolute Shrinkage and Selection Operator (LASSO-based) association mapping method using a functional multilocus mapping approach that utilizes latent traits, with a stability selection probability method as the hypothesis testing approach to determine a significant quantitative trait locus. The analysis provided 52 significant SNPs from 39 candidate genes, including genes previously implicated in wood formation and tree growth in spruce and other species. Our study represents a multilocus GWAS for complex wood traits in Norway spruce. The results advance our understanding of the genetics influencing wood traits and identifies candidate genes for future functional studies.
Collapse
|
research-article |
6 |
32 |
19
|
de La Harpe M, Hess J, Loiseau O, Salamin N, Lexer C, Paris M. A dedicated target capture approach reveals variable genetic markers across micro- and macro-evolutionary time scales in palms. Mol Ecol Resour 2019; 19:221-234. [PMID: 30240120 DOI: 10.1111/1755-0998.12945] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 08/15/2018] [Accepted: 08/28/2018] [Indexed: 11/29/2022]
Abstract
Understanding the genetics of biological diversification across micro- and macro-evolutionary time scales is a vibrant field of research for molecular ecologists as rapid advances in sequencing technologies promise to overcome former limitations. In palms, an emblematic, economically and ecologically important plant family with high diversity in the tropics, studies of diversification at the population and species levels are still hampered by a lack of genomic markers suitable for the genotyping of large numbers of recently diverged taxa. To fill this gap, we used a whole genome sequencing approach to develop target sequencing for molecular markers in 4,184 genome regions, including 4,051 genes and 133 non-genic putatively neutral regions. These markers were chosen to cover a wide range of evolutionary rates allowing future studies at the family, genus, species and population levels. Special emphasis was given to the avoidance of copy number variation during marker selection. In addition, a set of 149 well-known sequence regions previously used as phylogenetic markers by the palm biological research community were included in the target regions, to open the possibility to combine and jointly analyse already available data sets with genomic data to be produced with this new toolkit. The bait set was effective for species belonging to all three palm sub-families tested (Arecoideae, Ceroxyloideae and Coryphoideae), with high mapping rates, specificity and efficiency. The number of high-quality single nucleotide polymorphisms (SNPs) detected at both the sub-family and population levels facilitates efficient analyses of genomic diversity across micro- and macro-evolutionary time scales.
Collapse
|
|
6 |
31 |
20
|
Shee ZQ, Frodin DG, Cámara-Leret R, Pokorny L. Reconstructing the Complex Evolutionary History of the Papuasian Schefflera Radiation Through Herbariomics. FRONTIERS IN PLANT SCIENCE 2020; 11:258. [PMID: 32265950 PMCID: PMC7099051 DOI: 10.3389/fpls.2020.00258] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 02/19/2020] [Indexed: 05/19/2023]
Abstract
With its large proportion of endemic taxa, complex geological past, and location at the confluence of the highly diverse Malesian and Australian floristic regions, Papuasia - the floristic region comprising the Bismarck Archipelago, New Guinea, and the Solomon Islands - represents an ideal natural experiment in plant biogeography. However, scattered knowledge of its flora and limited representation in herbaria have hindered our understanding of the drivers of its diversity. Focusing on the woody angiosperm genus Schefflera (Araliaceae), we ask whether its morphologically defined infrageneric groupings are monophyletic, when these lineages diverged, and where (within Papuasia or elsewhere) they diversified. To address these questions, we use a high-throughput sequencing approach (Hyb-Seq) which combines target capture (with an angiosperm-wide bait kit targeting 353 single-copy nuclear loci) and genome shotgun sequencing (which allows retrieval of regions in high-copy number, e.g., organellar DNA) of historical herbarium collections. To reconstruct the evolutionary history of the genus we reconstruct molecular phylogenies with Bayesian inference, maximum likelihood, and pseudo-coalescent approaches, and co-estimate divergence times and ancestral areas in a Bayesian framework. We find strong support for most infrageneric morphological groupings, as currently circumscribed, and we show the efficacy of the Angiosperms-353 probe kit in resolving both deep and shallow phylogenetic relationships. We infer a sequence of colonization to explain the present-day distribution of Schefflera in Papuasia: from the Sunda Shelf, Schefflera arrived to the Woodlark plate (present-day eastern New Guinea) in the late Oligocene (when most of New Guinea was submerged) and, subsequently (throughout the Miocene), it migrated westwards (to the Maoke and Bird's Head Plates and thereon) and further diversified, in agreement with previous reconstructions.
Collapse
|
research-article |
5 |
28 |
21
|
Linck EB, Hanna ZR, Sellas A, Dumbacher JP. Evaluating hybridization capture with RAD probes as a tool for museum genomics with historical bird specimens. Ecol Evol 2017; 7:4755-4767. [PMID: 28690805 PMCID: PMC5496524 DOI: 10.1002/ece3.3065] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 04/17/2017] [Accepted: 04/19/2017] [Indexed: 12/30/2022] Open
Abstract
Laboratory techniques for high-throughput sequencing have enhanced our ability to generate DNA sequence data from millions of natural history specimens collected prior to the molecular era, but remain poorly tested at shallower evolutionary time scales. Hybridization capture using restriction site-associated DNA probes (hyRAD) is a recently developed method for population genomics with museum specimens. The hyRAD method employs fragments produced in a restriction site-associated double digestion as the basis for probes that capture orthologous loci in samples of interest. While promising in that it does not require a reference genome, hyRAD has yet to be applied across study systems in independent laboratories. Here, we provide an independent assessment of the effectiveness of hyRAD on both fresh avian tissue and dried tissue from museum specimens up to 140 years old and investigate how variable quantities of input DNA affect sequencing, assembly, and population genetic inference. We present a modified bench protocol and bioinformatics pipeline, including three steps for detection and removal of microbial and mitochondrial DNA contaminants. We confirm that hyRAD is an effective tool for sampling thousands of orthologous SNPs from historic museum specimens to describe phylogeographic patterns. We find that modern DNA performs significantly better than historical DNA better during sequencing but that assembly performance is largely equivalent. We also find that the quantity of input DNA predicts %GC content of assembled contiguous sequences, suggesting PCR bias. We caution against sampling schemes that include taxonomic or geographic autocorrelation across modern and historic samples.
Collapse
|
research-article |
8 |
27 |
22
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
|
Comparative Study |
5 |
26 |
23
|
Targeted capture of homoeologous coding and noncoding sequence in polyploid cotton. G3-GENES GENOMES GENETICS 2012; 2:921-30. [PMID: 22908041 PMCID: PMC3411248 DOI: 10.1534/g3.112.003392] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 06/15/2012] [Indexed: 12/30/2022]
Abstract
Targeted sequence capture is a promising technology in many areas in biology. These methods enable efficient and relatively inexpensive sequencing of hundreds to thousands of genes or genomic regions from many more individuals than is practical using whole-genome sequencing approaches. Here, we demonstrate the feasibility of target enrichment using sequence capture in polyploid cotton. To capture and sequence both members of each gene pair (homeologs) of wild and domesticated Gossypium hirsutum, we created custom hybridization probes to target 1000 genes (500 pairs of homeologs) using information from the cotton transcriptome. Two widely divergent samples of G. hirsutum were hybridized to four custom NimbleGen capture arrays containing probes for targeted genes. We show that the two coresident homeologs in the allopolyploid nucleus were efficiently captured with high coverage. The capture efficiency was similar between the two accessions and independent of whether the samples were multiplexed. A significant amount of flanking, nontargeted sequence (untranslated regions and introns) was also captured and sequenced along with the targeted exons. Intraindividual heterozygosity is low in both wild and cultivated Upland cotton, as expected from the high level of inbreeding in natural G. hirsutum and bottlenecks accompanying domestication. In addition, levels of heterozygosity appeared asymmetrical with respect to genome (AT or DT) in cultivated cotton. The approach used here is general, scalable, and may be adapted for many different research inquiries involving polyploid plant genomes.
Collapse
|
Research Support, U.S. Gov't, Non-P.H.S. |
13 |
26 |
24
|
Guilmatre A, Highnam G, Borel C, Mittelman D, Sharp AJ. Rapid multiplexed genotyping of simple tandem repeats using capture and high-throughput sequencing. Hum Mutat 2013; 34:1304-11. [PMID: 23696428 DOI: 10.1002/humu.22359] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Accepted: 05/07/2013] [Indexed: 11/12/2022]
Abstract
Although simple tandem repeats (STRs) comprise ~2% of the human genome and represent an important source of polymorphism, this class of variation remains understudied. We have developed a cost-effective strategy for performing targeted enrichment of STR regions that utilizes capture probes targeting the flanking sequences of STR loci, enabling specific capture of DNA fragments containing STRs for subsequent high-throughput sequencing. Utilizing a capture design targeting 6,243 STR loci <94 bp and multiplexing eight individuals in a single Illumina HiSeq2000 sequencing lane we were able to call genotypes in at least one individual for 67.5% of the targeted STRs. We observed a strong relationship between (G+C) content and genotyping rate. STRs with moderate (G+C) content were recovered with >90% success rate, whereas only 12% of STRs with ≥ 80% (G+C) were genotyped in our assay. Analysis of a parent-offspring trio, complete hydatidiform mole samples, repeat analyses of the same individual, and Sanger sequencing-based validation indicated genotyping error rates between 7.6% and 12.4%. The majority of such errors were a single repeat unit at mono- or dinucleotide repeats. Altogether, our STR capture assay represents a cost-effective method that enables multiplexed genotyping of thousands of STR loci suitable for large-scale population studies.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
25 |
25
|
Schiessl SV, Huettel B, Kuehn D, Reinhardt R, Snowdon RJ. Flowering Time Gene Variation in Brassica Species Shows Evolutionary Principles. FRONTIERS IN PLANT SCIENCE 2017; 8:1742. [PMID: 29089948 PMCID: PMC5651034 DOI: 10.3389/fpls.2017.01742] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 09/25/2017] [Indexed: 05/02/2023]
Abstract
Flowering time genes have a strong influence on successful reproduction and life cycle adaptation. However, their regulation is highly complex and only well understood in diploid model systems. For crops with a polyploid background from the genus Brassica, data on flowering time gene variation are scarce, although indispensable for modern breeding techniques like marker-assisted breeding. We have deep-sequenced all paralogs of 35 Arabidopsis thaliana flowering regulators using Sequence Capture followed by Illumina sequencing in two selected accessions of the vegetable species Brassica rapa and Brassica oleracea, respectively. Using these data, we were able to call SNPs, InDels and copy number variations (CNVs) for genes from the total flowering time network including central flowering regulators, but also genes from the vernalisation pathway, the photoperiod pathway, temperature regulation, the circadian clock and the downstream effectors. Comparing the results to a complementary data set from the allotetraploid species Brassica napus, we detected rearrangements in B. napus which probably occurred early after the allopolyploidisation event. Those data are both a valuable resource for flowering time research in those vegetable species, as well as a contribution to speciation genetics.
Collapse
|
research-article |
8 |
21 |