201
|
Jagielski T, Gawor J, Bakuła Z, Zuchniewicz K, Żak I, Gromadka R. An optimized method for high quality DNA extraction from microalga Prototheca wickerhamii for genome sequencing. PLANT METHODS 2017; 13:77. [PMID: 29026433 PMCID: PMC5627410 DOI: 10.1186/s13007-017-0228-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 09/19/2017] [Indexed: 05/25/2023]
Abstract
BACKGROUND The complex cell wall structure of algae often precludes efficient extraction of their genetic material. The purpose of this study was to design a next-generation sequencing-suitable DNA isolation method for unicellular, achlorophyllous, yeast-like microalgae of the genus Prototheca, the only known plant pathogens of both humans and animals. The effectiveness of the newly proposed scheme was compared with five other, previously described methods, commonly used for DNA isolation from plants and/or yeasts, available either as laboratory-developed, in-house assays, based on liquid nitrogen grinding or different enzymatic digestion, or as commercially manufactured kits. RESULTS All five, previously described, isolation assays yielded DNA concentrations lower than those obtained with the new method, averaging 16.15 ± 25.39 vs 74.2 ± 0.56 ng/µL, respectively. The new method was also superior in terms of DNA purity, as measured by A260/A280 (-0.41 ± 4.26 vs 2.02 ± 0.03), and A260/A230 (1.20 ± 1.12 vs 1.97 ± 0.07) ratios. Only the liquid nitrogen-based method yielded DNA of comparable quantity (60.96 ± 0.16 ng/µL) and quality (A260/A280 = 2.08 ± 0.02; A260/A230 = 2.23 ± 0.26). Still, the new method showed higher integrity, which was best illustrated upon electrophoretic analysis. Genomic DNA of Prototheca wickerhamii POL-1 strain isolated with the protocol herein proposed was successfully sequenced on the Illumina MiSeq platform. CONCLUSIONS A new method for DNA isolation from Prototheca algae is described. The method, whose protocol involves glass beads pulverization and cesium chloride (CsCl) density gradient centrifugation, was demonstrated superior over the other common assays in terms of DNA quantity and quality. The method is also the first to offer the possibility of preparation of DNA template suitable for whole genome sequencing of Prototheca spp.
Collapse
Affiliation(s)
- Tomasz Jagielski
- Department of Applied Microbiology, Institute of Microbiology, Faculty of Biology, University of Warsaw, I. Miecznikowa 1, 02-096 Warsaw, Poland
| | - Jan Gawor
- DNA Sequencing and Oligonucleotides Synthesis Laboratory at the Institute of Biochemistry and Biophysics, Polish Academy of Sciences, A. Pawińskiego 5a, 02-106 Warsaw, Poland
| | - Zofia Bakuła
- Department of Applied Microbiology, Institute of Microbiology, Faculty of Biology, University of Warsaw, I. Miecznikowa 1, 02-096 Warsaw, Poland
| | - Karolina Zuchniewicz
- DNA Sequencing and Oligonucleotides Synthesis Laboratory at the Institute of Biochemistry and Biophysics, Polish Academy of Sciences, A. Pawińskiego 5a, 02-106 Warsaw, Poland
| | - Iwona Żak
- Department of Clinical Microbiology, Children’s University Hospital of Cracow, Kraków, Poland
| | - Robert Gromadka
- DNA Sequencing and Oligonucleotides Synthesis Laboratory at the Institute of Biochemistry and Biophysics, Polish Academy of Sciences, A. Pawińskiego 5a, 02-106 Warsaw, Poland
| |
Collapse
|
202
|
Alhakami H, Mirebrahim H, Lonardi S. A comparative evaluation of genome assembly reconciliation tools. Genome Biol 2017; 18:93. [PMID: 28521789 PMCID: PMC5436433 DOI: 10.1186/s13059-017-1213-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 04/12/2017] [Indexed: 11/17/2022] Open
Abstract
Background The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation. Results Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input. Conclusions None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1213-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hind Alhakami
- Department of Computer Science & Engineering, University of California, 900 University Avenue, Riverside, 92521, CA, USA.
| | - Hamid Mirebrahim
- Department of Computer Science & Engineering, University of California, 900 University Avenue, Riverside, 92521, CA, USA
| | - Stefano Lonardi
- Department of Computer Science & Engineering, University of California, 900 University Avenue, Riverside, 92521, CA, USA
| |
Collapse
|
203
|
Abstract
There is great potential for genome sequencing to enhance patient care through improved diagnostic sensitivity and more precise therapeutic targeting. To maximize this potential, genomics strategies that have been developed for genetic discovery - including DNA-sequencing technologies and analysis algorithms - need to be adapted to fit clinical needs. This will require the optimization of alignment algorithms, attention to quality-coverage metrics, tailored solutions for paralogous or low-complexity areas of the genome, and the adoption of consensus standards for variant calling and interpretation. Global sharing of this more accurate genotypic and phenotypic data will accelerate the determination of causality for novel genes or variants. Thus, a deeper understanding of disease will be realized that will allow its targeting with much greater therapeutic precision.
Collapse
Affiliation(s)
- Euan A Ashley
- Center for Inherited Cardiovascular Disease, Falk Cardiovascular Research Building, Stanford Medicine, 870 Quarry Road, Stanford, California 94305, USA
| |
Collapse
|
204
|
Liem M, Jansen HJ, Dirks RP, Henkel CV, van Heusden GPH, Lemmers RJLF, Omer T, Shao S, Punt PJ, Spaink HP. De novo whole-genome assembly of a wild type yeast isolate using nanopore sequencing. F1000Res 2017; 6:618. [PMID: 30135709 PMCID: PMC6081980 DOI: 10.12688/f1000research.11146.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/09/2018] [Indexed: 11/20/2022] Open
Abstract
Background: The introduction of the MinION sequencing device by Oxford Nanopore Technologies may greatly accelerate whole genome sequencing. Nanopore sequence data offers great potential for de novo assembly of complex genomes without using other technologies. Furthermore, Nanopore data combined with other sequencing technologies is highly useful for accurate annotation of all genes in the genome. In this manuscript we used nanopore sequencing as a tool to classify yeast strains. Methods: We compared various technical and software developments for the nanopore sequencing protocol, showing that the R9 chemistry is, as predicted, higher in quality than R7.3 chemistry. The R9 chemistry is an essential improvement for assembly of the extremely AT-rich mitochondrial genome. We double corrected assemblies from four different assemblers with PILON and assessed sequence correctness before and after PILON correction with a set of 290 Fungi genes using BUSCO. Results: In this study, we used this new technology to sequence and de novo assemble the genome of a recently isolated ethanologenic yeast strain, and compared the results with those obtained by classical Illumina short read sequencing. This strain was originally named Candida vartiovaarae ( Torulopsis vartiovaarae) based on ribosomal RNA sequencing. We show that the assembly using nanopore data is much more contiguous than the assembly using short read data. We also compared various technical and software developments for the nanopore sequencing protocol, showing that nanopore-derived assemblies provide the highest contiguity. Conclusions: The mitochondrial and chromosomal genome sequences showed that our strain is clearly distinct from other yeast taxons and most closely related to published Cyberlindnera species. In conclusion, MinION-mediated long read sequencing can be used for high quality de novo assembly of new eukaryotic microbial genomes.
Collapse
Affiliation(s)
- Michael Liem
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands
| | - Hans J Jansen
- Future Genomics Technologies B.V., Leiden, 2333 BE, Netherlands
| | - Ron P Dirks
- Future Genomics Technologies B.V., Leiden, 2333 BE, Netherlands
| | | | | | - Richard J L F Lemmers
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333 ZA, Netherlands
| | - Trifa Omer
- Dutch DNA Biotech B.V., Utrecht, 3584 CH, Netherlands
| | - Shuai Shao
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands
| | - Peter J Punt
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands.,Dutch DNA Biotech B.V., Utrecht, 3584 CH, Netherlands
| | - Herman P Spaink
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands
| |
Collapse
|
205
|
Liem M, Jansen HJ, Dirks RP, Henkel CV, van Heusden GPH, Lemmers RJLF, Omer T, Shao S, Punt PJ, Spaink HP. De novo whole-genome assembly of a wild type yeast isolate using nanopore sequencing. F1000Res 2017; 6:618. [PMID: 30135709 DOI: 10.12688/f1000research.11146.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/18/2017] [Indexed: 11/20/2022] Open
Abstract
Background: The introduction of the MinION sequencing device by Oxford Nanopore Technologies may greatly accelerate whole genome sequencing. Nanopore sequence data offers great potential for de novo assembly of complex genomes without using other technologies. Furthermore, Nanopore data combined with other sequencing technologies is highly useful for accurate annotation of all genes in the genome. In this manuscript we used nanopore sequencing as a tool to classify yeast strains. Methods: We compared various technical and software developments for the nanopore sequencing protocol, showing that the R9 chemistry is, as predicted, higher in quality than R7.3 chemistry. The R9 chemistry is an essential improvement for assembly of the extremely AT-rich mitochondrial genome. We double corrected assemblies from four different assemblers with PILON and assessed sequence correctness before and after PILON correction with a set of 290 Fungi genes using BUSCO. Results: In this study, we used this new technology to sequence and de novo assemble the genome of a recently isolated ethanologenic yeast strain, and compared the results with those obtained by classical Illumina short read sequencing. This strain was originally named Candida vartiovaarae ( Torulopsis vartiovaarae) based on ribosomal RNA sequencing. We show that the assembly using nanopore data is much more contiguous than the assembly using short read data. We also compared various technical and software developments for the nanopore sequencing protocol, showing that nanopore-derived assemblies provide the highest contiguity. Conclusions: The mitochondrial and chromosomal genome sequences showed that our strain is clearly distinct from other yeast taxons and most closely related to published Cyberlindnera species. In conclusion, MinION-mediated long read sequencing can be used for high quality de novo assembly of new eukaryotic microbial genomes.
Collapse
Affiliation(s)
- Michael Liem
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands
| | - Hans J Jansen
- Future Genomics Technologies B.V., Leiden, 2333 BE, Netherlands
| | - Ron P Dirks
- Future Genomics Technologies B.V., Leiden, 2333 BE, Netherlands
| | | | | | - Richard J L F Lemmers
- Department of Human Genetics, Leiden University Medical Center, Leiden, 2333 ZA, Netherlands
| | - Trifa Omer
- Dutch DNA Biotech B.V., Utrecht, 3584 CH, Netherlands
| | - Shuai Shao
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands
| | - Peter J Punt
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands.,Dutch DNA Biotech B.V., Utrecht, 3584 CH, Netherlands
| | - Herman P Spaink
- Institute of Biology, Leiden University, Leiden, 2300 RA, Netherlands
| |
Collapse
|
206
|
Khost DE, Eickbush DG, Larracuente AM. Single-molecule sequencing resolves the detailed structure of complex satellite DNA loci in Drosophila melanogaster. Genome Res 2017; 27:709-721. [PMID: 28373483 PMCID: PMC5411766 DOI: 10.1101/gr.213512.116] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 03/15/2017] [Indexed: 12/21/2022]
Abstract
Highly repetitive satellite DNA (satDNA) repeats are found in most eukaryotic genomes. SatDNAs are rapidly evolving and have roles in genome stability and chromosome segregation. Their repetitive nature poses a challenge for genome assembly and makes progress on the detailed study of satDNA structure difficult. Here, we use single-molecule sequencing long reads from Pacific Biosciences (PacBio) to determine the detailed structure of all major autosomal complex satDNA loci in Drosophila melanogaster, with a particular focus on the 260-bp and Responder satellites. We determine the optimal de novo assembly methods and parameter combinations required to produce a high-quality assembly of these previously unassembled satDNA loci and validate this assembly using molecular and computational approaches. We determined that the computationally intensive PBcR-BLASR assembly pipeline yielded better assemblies than the faster and more efficient pipelines based on the MHAP hashing algorithm, and it is essential to validate assemblies of repetitive loci. The assemblies reveal that satDNA repeats are organized into large arrays interrupted by transposable elements. The repeats in the center of the array tend to be homogenized in sequence, suggesting that gene conversion and unequal crossovers lead to repeat homogenization through concerted evolution, although the degree of unequal crossing over may differ among complex satellite loci. We find evidence for higher-order structure within satDNA arrays that suggest recent structural rearrangements. These assemblies provide a platform for the evolutionary and functional genomics of satDNAs in pericentric heterochromatin.
Collapse
Affiliation(s)
- Daniel E Khost
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Danna G Eickbush
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | | |
Collapse
|
207
|
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017; 27:722-736. [PMID: 28298431 PMCID: PMC5411767 DOI: 10.1101/gr.215087.116] [Citation(s) in RCA: 4314] [Impact Index Per Article: 616.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 03/03/2017] [Indexed: 12/11/2022]
Abstract
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Jason R Miller
- J. Craig Venter Institute, Rockville, Maryland 20850, USA
| | - Nicholas H Bergman
- National Biodefense Analysis and Countermeasures Center, Frederick, Maryland 21702, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
208
|
Abstract
Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome of Drosophila serrata, a species from the montium subgroup that has been well-studied for latitudinal clines, sexual selection, and gene expression, but which lacks a reference genome. Using 11 PacBio single-molecule real-time (SMRT cells), we generated 12 Gbp of raw sequence data comprising ∼65 × whole-genome coverage. Read lengths averaged 8940 bp (NRead50 12,200) with the longest read at 53 kbp. We self-corrected reads using the PBDagCon algorithm and assembled the genome using the MHAP algorithm within the PBcR assembler. Total genome length was 198 Mbp with an N50 just under 1 Mbp. Contigs displayed a high degree of chromosome arm-level conservation with the D. melanogaster genome and many could be sensibly placed on the D. serrata physical map. We also provide an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data.
Collapse
|
209
|
Draft Genome Sequence of the Oleaginous Green Alga Tetradesmus obliquus UTEX 393. GENOME ANNOUNCEMENTS 2017; 5:5/3/e01449-16. [PMID: 28104651 PMCID: PMC5255914 DOI: 10.1128/genomea.01449-16] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The microalgae Tetradesmus obliquus is able to maintain a high photosynthetic efficiency under nitrogen limitation and is considered a promising green microalgae for sustainable production of diverse compounds, including biofuels. Here, we report the first draft whole-genome shotgun sequencing of T. obliquus. The final assembly comprises 108,715,903 bp with over 1,368 scaffolds.
Collapse
|
210
|
Zhou X, Peris D, Kominek J, Kurtzman CP, Hittinger CT, Rokas A. In Silico Whole Genome Sequencer and Analyzer (iWGS): a Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies. G3 (BETHESDA, MD.) 2016; 6:3655-3662. [PMID: 27638685 PMCID: PMC5100864 DOI: 10.1534/g3.116.034249] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 09/08/2016] [Indexed: 11/18/2022]
Abstract
The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimental design and analysis, we developed iWGS ( in silicoWhole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.
Collapse
Affiliation(s)
- Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235
| | - David Peris
- Laboratory of Genetics, Genome Center of Wisconsin, Department of Energy Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706
| | - Jacek Kominek
- Laboratory of Genetics, Genome Center of Wisconsin, Department of Energy Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706
| | - Cletus P Kurtzman
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agricultural Utilization Research, Agricultural Research Service, US Department of Agriculture, Peoria, Illinois 61604
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, Department of Energy Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Wisconsin 53706
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235
| |
Collapse
|