1
|
Giorgashvili E, Reichel K, Caswara C, Kerimov V, Borsch T, Gruenstaeudl M. Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic Calligonum bakuense. FRONTIERS IN PLANT SCIENCE 2022; 13:779830. [PMID: 35874012 PMCID: PMC9296850 DOI: 10.3389/fpls.2022.779830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
Collapse
Affiliation(s)
- Eka Giorgashvili
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Katja Reichel
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Calvinna Caswara
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Vuqar Kerimov
- Institute of Botany, Azerbaijan National Academy of Sciences (ANAS), Baku, Azerbaijan
| | - Thomas Borsch
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
- Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, Berlin, Germany
| | - Michael Gruenstaeudl
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
2
|
Travadi T, Sharma S, Pandit R, Nakrani M, Joshi C, Joshi M. A duplex PCR assay for authentication of Ocimum basilicum L. and Ocimum tenuiflorum L in Tulsi churna. Food Control 2022. [DOI: 10.1016/j.foodcont.2021.108790] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
3
|
Ye H, Liu H, Hu G, Zhao P. The complete chloroplast genome sequence of Paphiopedilum henryanum (Orchidaceae). Mitochondrial DNA B Resour 2022; 7:1174-1176. [PMID: 35935683 PMCID: PMC9354640 DOI: 10.1080/23802359.2022.2088310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Affiliation(s)
- Hang Ye
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an, China
| | - Hengzhao Liu
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an, China
| | - Guojia Hu
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an, China
| | - Peng Zhao
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an, China
| |
Collapse
|
4
|
Pezoa I, Villacreses J, Rubilar M, Pizarro C, Galleguillos MJ, Ejsmentewicz T, Fonseca B, Espejo J, Polanco V, Sánchez C. Generation of Chloroplast Molecular Markers to Differentiate Sophora toromiro and Its Hybrids as a First Approach to Its Reintroduction in Rapa Nui (Easter Island). PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10020342. [PMID: 33578941 PMCID: PMC7916652 DOI: 10.3390/plants10020342] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/06/2021] [Accepted: 02/07/2021] [Indexed: 05/03/2023]
Abstract
Sophora toromiro is an endemic tree of Rapa Nui with religious and cultural relevance that despite being extinct in the wild, still persists in botanical gardens and private collections around the world. The authenticity of some toromiro trees has been questioned because the similarities among hybrid lines leads to misclassification of the species. The conservation program of toromiro has the objective of its reinsertion into Rapa Nui, but it requires the exact genotyping and certification of the selected plants in order to efficiently reintroduce the species. In this study, we present for the first time the complete chloroplast genome of S. toromiro and four other Sophora specimens, which were sequenced de-novo and assembled after mapping the raw reads to a chloroplast database. The length of the chloroplast genomes ranges from 154,239 to 154,473 bp. A total of 130-143 simple sequence repeats (SSR) loci and 577 single nucleotide polymorphisms (SNPs) were identified.
Collapse
Affiliation(s)
- Ignacio Pezoa
- School of Biotechnology, Universidad Mayor, Santiago 8580745, Chile; (I.P.); (V.P.)
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
- Network Biology Laboratory, Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago 8580745, Chile
| | - Javier Villacreses
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
- Network Biology Laboratory, Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago 8580745, Chile
- PhD Program in Integrative Genomics, Universidad Mayor, Santiago 8580745, Chile;
| | - Miguel Rubilar
- PhD Program in Integrative Genomics, Universidad Mayor, Santiago 8580745, Chile;
| | - Carolina Pizarro
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
| | - María Jesús Galleguillos
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
| | - Troy Ejsmentewicz
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
| | - Beatriz Fonseca
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
| | - Jaime Espejo
- National Botanic Garden of Viña del Mar, Valparaíso 2561881, Chile;
| | - Víctor Polanco
- School of Biotechnology, Universidad Mayor, Santiago 8580745, Chile; (I.P.); (V.P.)
| | - Carolina Sánchez
- Advanced Genomics Core, Universidad Mayor, Santiago 8580745, Chile; (J.V.); (C.P.); (M.J.G.); (T.E.); (B.F.)
- Applied Genomics Laboratory, Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago 8580745, Chile
- Correspondence: ; Tel.: +56-2-2328-1305
| |
Collapse
|
5
|
Yang Z, Li H, Jia Y, Zheng Y, Meng H, Bao T, Li X, Luo L. Intrinsic laws of k-mer spectra of genome sequences and evolution mechanism of genomes. BMC Evol Biol 2020; 20:157. [PMID: 33228538 PMCID: PMC7684957 DOI: 10.1186/s12862-020-01723-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/10/2020] [Indexed: 11/17/2022] Open
Abstract
Background K-mer spectra of DNA sequences contain important information about sequence composition and sequence evolution. We want to reveal the evolution rules of genome sequences by studying the k-mer spectra of genome sequences. Results The intrinsic laws of k-mer spectra of 920 genome sequences from primate to prokaryote were analyzed. We found that there are two types of evolution selection modes in genome sequences, named as CG Independent Selection and TA Independent Selection. There is a mutual inhibition relationship between CG and TA independent selections. We found that the intensity of CG and TA independent selections correlates closely with genome evolution and G + C content of genome sequences. The living habits of species are related closely to the independent selection modes adopted by species genomes. Consequently, we proposed an evolution mechanism of genomes in which the genome evolution is determined by the intensities of the CG and TA independent selections and the mutual inhibition relationship. Besides, by the evolution mechanism of genomes, we speculated the evolution modes of prokaryotes in mild and extreme environments in the anaerobic age and the evolving process of prokaryotes from anaerobic to aerobic environment on earth as well as the originations of different eukaryotes. Conclusion We found that there are two independent selection modes in genome sequences. The evolution of genome sequence is determined by the two independent selection modes and the mutual inhibition relationship between them.
Collapse
Affiliation(s)
- Zhenhua Yang
- Laboratory of Theoretical Biophysics, School of Physical Science & Technology, Inner Mongolia University, Hohhot, 010021, China.,School of Economics and Management, Inner Mongolia University of Science & Technology, Baotou, 014010, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science & Technology, Inner Mongolia University, Hohhot, 010021, China.
| | - Yun Jia
- College of Science, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Yan Zheng
- Baotou Medical College, Inner Mongolia University of Science & Technology, Baotou, 014040, China
| | - Hu Meng
- School of Life Science & Technology, Inner Mongolia University of Science & Technology, Baotou, 014010, China
| | - Tonglaga Bao
- Laboratory of Theoretical Biophysics, School of Physical Science & Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Xiaolong Li
- Laboratory of Theoretical Biophysics, School of Physical Science & Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Liaofu Luo
- Laboratory of Theoretical Biophysics, School of Physical Science & Technology, Inner Mongolia University, Hohhot, 010021, China
| |
Collapse
|
6
|
Freudenthal JA, Pfaff S, Terhoeven N, Korte A, Ankenbrand MJ, Förster F. A systematic comparison of chloroplast genome assembly tools. Genome Biol 2020; 21:254. [PMID: 32988404 PMCID: PMC7520963 DOI: 10.1186/s13059-020-02153-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 08/22/2020] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Chloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome. RESULTS The performance of different assembly tools has never been systematically compared. Here, we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes. CONCLUSIONS We create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible.
Collapse
Affiliation(s)
- Jan A. Freudenthal
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074 Germany
- AnaLife Data Science, Wiesengrund 16, Würzburg, 97295 Waldbrunn Germany
| | - Simon Pfaff
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074 Germany
- Department of Bioinformatics, University of Würzburg, Biozentrum, Am Hubland, Würzburg, 97074 Germany
| | - Niklas Terhoeven
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074 Germany
- AnaLife Data Science, Wiesengrund 16, Würzburg, 97295 Waldbrunn Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074 Germany
| | - Markus J. Ankenbrand
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074 Germany
- AnaLife Data Science, Wiesengrund 16, Würzburg, 97295 Waldbrunn Germany
- Chair of Cellular and Molecular Imaging, Comprehensive Heart Failure Center, University Hospital Würzburg, Josef-Schneider-Str. 2, Würzburg, 97080 Germany
| | - Frank Förster
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074 Germany
- Department of Bioinformatics, University of Würzburg, Biozentrum, Am Hubland, Würzburg, 97074 Germany
- Fraunhofer IME-BR, Ohlebergsweg 12, Gießen, 35392 Germany
- Bioinformatics Core Facility of the University of Gießen, Heinrich-Buff-Ring 58, Gießen, 35392 Germany
| |
Collapse
|
7
|
Freudenthal JA, Pfaff S, Terhoeven N, Korte A, Ankenbrand MJ, Förster F. A systematic comparison of chloroplast genome assembly tools. Genome Biol 2020; 21:254. [PMID: 32988404 DOI: 10.1101/665869] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 08/22/2020] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND Chloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome. RESULTS The performance of different assembly tools has never been systematically compared. Here, we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes. CONCLUSIONS We create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible.
Collapse
Affiliation(s)
- Jan A Freudenthal
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074, Germany
| | - Simon Pfaff
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074, Germany
- AnaLife Data Science, Wiesengrund 16, Würzburg, 97295 Waldbrunn, Germany
- Chair of Cellular and Molecular Imaging, Comprehensive Heart Failure Center, University Hospital Würzburg, Josef-Schneider-Str. 2, Würzburg, 97080, Germany
| | - Niklas Terhoeven
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074, Germany
- AnaLife Data Science, Wiesengrund 16, Würzburg, 97295 Waldbrunn, Germany
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074, Germany
- Department of Bioinformatics, University of Würzburg, Biozentrum, Am Hubland, Würzburg, 97074, Germany
| | - Markus J Ankenbrand
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074, Germany
- AnaLife Data Science, Wiesengrund 16, Würzburg, 97295 Waldbrunn, Germany
| | - Frank Förster
- Center for Computational and Theoretical Biology, University of Würzburg, Campus Hubland Nord, Würzburg, 97074, Germany.
- Department of Bioinformatics, University of Würzburg, Biozentrum, Am Hubland, Würzburg, 97074, Germany.
- Fraunhofer IME-BR, Ohlebergsweg 12, Gießen, 35392, Germany.
- Bioinformatics Core Facility of the University of Gießen, Heinrich-Buff-Ring 58, Gießen, 35392, Germany.
| |
Collapse
|
8
|
Zhou C, Duarte T, Silvestre R, Rossel G, Mwanga ROM, Khan A, George AW, Fei Z, Yencho GC, Ellis D, Coin LJM. Insights into population structure of East African sweetpotato cultivars from hybrid assembly of chloroplast genomes. Gates Open Res 2020; 2:41. [PMID: 33062940 PMCID: PMC7536352 DOI: 10.12688/gatesopenres.12856.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/17/2020] [Indexed: 11/20/2022] Open
Abstract
Background: The chloroplast (cp) genome is an important resource for studying plant diversity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage (<1×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida. Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.
Collapse
Affiliation(s)
- Chenxi Zhou
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| | - Tania Duarte
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| | | | | | | | - Awais Khan
- International Potato Center, P.O. Box 1558, Lima 12, Peru.,Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Geneva, NY, 14456, USA
| | - Andrew W George
- Data61, CSIRO, Ecosciences Precinct, Brisbane, QLD, 4102, Australia
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY, 14853, USA
| | - G Craig Yencho
- Department of Horticulture, North Carolina State University, Raleigh, North Carolina, 27695, USA
| | - David Ellis
- International Potato Center, P.O. Box 1558, Lima 12, Peru
| | - Lachlan J M Coin
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| |
Collapse
|
9
|
Gruenstaeudl M, Jenke N. PACVr: plastome assembly coverage visualization in R. BMC Bioinformatics 2020; 21:207. [PMID: 32448146 PMCID: PMC7245912 DOI: 10.1186/s12859-020-3475-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 03/31/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Plastid genomes typically display a circular, quadripartite structure with two inverted repeat regions, which challenges automatic assembly procedures. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on genome structure and evolution. The average coverage depth of a genome assembly is often used as an indicator of assembly quality. Visualizing coverage depth across a draft genome is a critical step, which allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Despite the interplay between genome structure and assembly quality, no contemporary, user-friendly software tool can visualize the coverage depth of a plastid genome assembly while taking its quadripartite genome structure into account. A software tool is needed that fills this void. RESULTS We introduce 'PACVr', an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as the individual plastome genes. By using a variable window approach, the tool allows visualizations on different calculation scales. It also confirms sequence equality of, as well as visualizes gene synteny between, the inverted repeat regions of the input genome. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be invoked from a Unix shell, facilitating its use in automated quality control. We illustrate the application of PACVr on four empirical datasets and compare visualizations generated by PACVr with those of alternative software tools. CONCLUSIONS PACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) gene synteny across the inverted repeat regions. It contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences. The software, example datasets, technical documentation, and a tutorial are available with the package at https://cran.r-project.org/package=PACVr.
Collapse
Affiliation(s)
- Michael Gruenstaeudl
- Institut für Biologie, Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Berlin, 14195 Germany
| | - Nils Jenke
- Institut für Bioinformatik, Freie Universität Berlin, Berlin, 14195 Germany
| |
Collapse
|
10
|
Scheunert A, Dorfner M, Lingl T, Oberprieler C. Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing. PLoS One 2020; 15:e0226234. [PMID: 32208422 PMCID: PMC7092973 DOI: 10.1371/journal.pone.0226234] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 02/28/2020] [Indexed: 12/13/2022] Open
Abstract
The chloroplast genome harbors plenty of valuable information for phylogenetic research. Illumina short-read data is generally used for de novo assembly of whole plastomes. PacBio or Oxford Nanopore long reads are additionally employed in hybrid approaches to enable assembly across the highly similar inverted repeats of a chloroplast genome. Unlike for PacBio, plastome assemblies based solely on Nanopore reads are rarely found, due to their high error rate and non-random error profile. However, the actual quality decline connected to their use has rarely been quantified. Furthermore, no study has employed reference-based assembly using Nanopore reads, which is common with Illumina data. Using Leucanthemum Mill. as an example, we compared the sequence quality of seven chloroplast genome assemblies of the same species, using combinations of two sequencing platforms and three analysis pipelines. In addition, we assessed the factors which might influence Nanopore assembly quality during sequence generation and bioinformatic processing. The consensus sequence derived from de novo assembly of Nanopore data had a sequence identity of 99.59% compared to Illumina short-read de novo assembly. Most of the errors detected were indels (81.5%), and a large majority of them is part of homopolymer regions. The quality of reference-based assembly is heavily dependent upon the choice of a close-enough reference. When using a reference with 0.83% sequence divergence from the studied species, mapping of Nanopore reads results in a consensus comparable to that from Nanopore de novo assembly, and of only slightly inferior quality compared to a reference-based assembly with Illumina data. For optimal de novo assembly of Nanopore data, appropriate filtering of contaminants and chimeric sequences, as well as employing moderate read coverage, is essential. Based on these results, we conclude that Nanopore long reads are a suitable alternative to Illumina short reads in plastome phylogenomics. Few errors remain in the finalized assembly, which can be easily masked in phylogenetic analyses without loss in analytical accuracy. The easily applicable and cost-effective technology might warrant more attention by researchers dealing with plant chloroplast genomes.
Collapse
Affiliation(s)
- Agnes Scheunert
- Evolutionary and Systematic Botany Group, Institute of Plant Sciences, University of Regensburg, Regensburg, Germany
| | - Marco Dorfner
- Evolutionary and Systematic Botany Group, Institute of Plant Sciences, University of Regensburg, Regensburg, Germany
| | - Thomas Lingl
- Evolutionary and Systematic Botany Group, Institute of Plant Sciences, University of Regensburg, Regensburg, Germany
| | - Christoph Oberprieler
- Evolutionary and Systematic Botany Group, Institute of Plant Sciences, University of Regensburg, Regensburg, Germany
| |
Collapse
|
11
|
Nock CJ, Hardner CM, Montenegro JD, Ahmad Termizi AA, Hayashi S, Playford J, Edwards D, Batley J. Wild Origins of Macadamia Domestication Identified Through Intraspecific Chloroplast Genome Sequencing. FRONTIERS IN PLANT SCIENCE 2019; 10:334. [PMID: 30949191 PMCID: PMC6438079 DOI: 10.3389/fpls.2019.00334] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 03/04/2019] [Indexed: 05/13/2023]
Abstract
Identifying the geographic origins of crops is important for the conservation and utilization of novel genetic variation. Even so, the origins of many food crops remain elusive. The tree nut crop macadamia has a remarkable domestication history, from subtropical rain forests in Australia through Hawaii to global cultivation all within the last century. The industry is based primarily on Macadamia integrifolia and M. integrifolia-M. tetraphylla hybrid cultivars with Hawaiian cultivars the main contributors to world production. Sequence data from the chloroplast genome assembled using a genome skimming strategy was used to determine population structure among remnant populations of the main progenitor species, M. integrifolia. Phylogenetic analysis of a 506 bp chloroplast SNP alignment from 64 wild and cultivated accessions identified phylogeographic structure and deep divergences between clades providing evidence for historical barriers to seed dispersal. High levels of variation were detected among wild accessions. Most Hawaiian cultivars, however, shared a single chlorotype that was also present at two wild sites at Mooloo and Mt Bauple from the northernmost distribution of the species in south-east Queensland. Our results provide evidence for a maternal genetic bottleneck during early macadamia domestication, and pinpoint the likely source of seed used to develop the Hawaiian cultivars. The extensive variability and structuring of M. integrifolia chloroplast genomic variation detected in this study suggests much unexploited genetic diversity is available for improvement of this recently domesticated crop.
Collapse
Affiliation(s)
- Catherine J. Nock
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia
- *Correspondence: Catherine J. Nock,
| | - Craig M. Hardner
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | | | - Ainnatul A. Ahmad Termizi
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD, Australia
| | - Satomi Hayashi
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD, Australia
| | - Julia Playford
- Queensland Department of Environment and Science, Brisbane, QLD, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Crawley, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Crawley, WA, Australia
| |
Collapse
|
12
|
Zhou C, Duarte T, Silvestre R, Rossel G, Mwanga ROM, Khan A, George AW, Fei Z, Yencho GC, Ellis D, Coin LJM. Insights into population structure of East African sweetpotato cultivars from hybrid assembly of chloroplast genomes. Gates Open Res 2018; 2:41. [PMID: 33062940 PMCID: PMC7536352 DOI: 10.12688/gatesopenres.12856.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2018] [Indexed: 03/31/2024] Open
Abstract
Background: The chloroplast (cp) genome is an important resource for studying plant diversity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage (<1×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida. Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.
Collapse
Affiliation(s)
- Chenxi Zhou
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| | - Tania Duarte
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| | | | | | | | - Awais Khan
- International Potato Center, P.O. Box 1558, Lima 12, Peru
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, Geneva, NY, 14456, USA
| | - Andrew W. George
- Data61, CSIRO, Ecosciences Precinct, Brisbane, QLD, 4102, Australia
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY, 14853, USA
| | - G. Craig Yencho
- Department of Horticulture, North Carolina State University, Raleigh, North Carolina, 27695, USA
| | - David Ellis
- International Potato Center, P.O. Box 1558, Lima 12, Peru
| | - Lachlan J. M. Coin
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| |
Collapse
|
13
|
McKain MR, Johnson MG, Uribe‐Convers S, Eaton D, Yang Y. Practical considerations for plant phylogenomics. APPLICATIONS IN PLANT SCIENCES 2018; 6:e1038. [PMID: 29732268 PMCID: PMC5895195 DOI: 10.1002/aps3.1038] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 03/13/2018] [Indexed: 05/10/2023]
Abstract
The past decade has seen a major breakthrough in our ability to easily and inexpensively sequence genome-scale data from diverse lineages. The development of high-throughput sequencing and long-read technologies has ushered in the era of phylogenomics, where hundreds to thousands of nuclear genes and whole organellar genomes are routinely used to reconstruct evolutionary relationships. As a result, understanding which options are best suited for a particular set of questions can be difficult, especially for those just starting in the field. Here, we review the most recent advances in plant phylogenomic methods and make recommendations for project-dependent best practices and considerations. We focus on the costs and benefits of different approaches in regard to the information they provide researchers and the questions they can address. We also highlight unique challenges and opportunities in plant systems, such as polyploidy, reticulate evolution, and the use of herbarium materials, identifying optimal methodologies for each. Finally, we draw attention to lingering challenges in the field of plant phylogenomics, such as reusability of data sets, and look at some up-and-coming technologies that may help propel the field even further.
Collapse
Affiliation(s)
- Michael R. McKain
- Department of Biological SciencesThe University of AlabamaBox 870344TuscaloosaAlabama35487USA
| | - Matthew G. Johnson
- Department of Biological SciencesTexas Tech University2901 Main Street, Box 43131LubbockTexas79409USA
| | - Simon Uribe‐Convers
- Department of Ecology and Evolutionary BiologyUniversity of Michigan830 North UniversityAnn ArborMichigan48109USA
| | - Deren Eaton
- Department of Ecology, Evolution, and Environmental BiologyColumbia University1200 Amsterdam AvenueNew YorkNew York10027USA
| | - Ya Yang
- Department of Plant and Microbial BiologyUniversity of Minnesota–Twin Cities1445 Gortner AvenueSt. PaulMinnesota55108USA
| |
Collapse
|