76
|
Galvez LC, Koh RBL, Barbosa CFC, Asunto JC, Catalla JL, Atienza RG, Costales KT, Aquino VM, Zhang D. Sequencing and de Novo Assembly of Abaca ( Musa textilis Née) var. Abuab Genome. Genes (Basel) 2021; 12:genes12081202. [PMID: 34440376 PMCID: PMC8392402 DOI: 10.3390/genes12081202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 06/02/2021] [Accepted: 07/01/2021] [Indexed: 01/14/2023] Open
Abstract
Abaca (Musa textilis Née), an indigenous crop to the Philippines, is known to be the source of the strongest natural fiber. Despite its huge economic contributions, research on crop improvement is limited due to the lack of genomic data. In this study, the whole genome of the abaca var. Abuab was sequenced using Illumina Novaseq 6000 and Pacific Biosciences Single-Molecule Real-Time Sequel. The genome size of Abuab was estimated to be 616 Mbp based on total k-mer number and volume peak. Its genome was assembled at 65× depth, mapping 95.28% of the estimated genome size. BUSCO analysis recovered 78.2% complete BUSCO genes. A total of 33,277 gene structures were predicted which is comparable to the number of predicted genes from recently assembled Musa spp. genomes. A total of 330 Mbp repetitive elements were also mined, accounting to 53.6% of the genome length. Here we report the sequencing and genome assembly of the abaca var. Abuab that will facilitate gene discovery for crop improvement and an indispensable source for genetic diversity studies in Musa.
Collapse
|
77
|
McCartney AM, Hilario E, Choi S, Guhlin J, Prebble JM, Houliston G, Buckley TR, Chagné D. An exploration of assembly strategies and quality metrics on the accuracy of the rewarewa (Knightia excelsa) genome. Mol Ecol Resour 2021; 21:2125-2144. [PMID: 33955186 PMCID: PMC8362059 DOI: 10.1111/1755-0998.13406] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/18/2021] [Accepted: 04/20/2021] [Indexed: 12/17/2022]
Abstract
We used long read sequencing data generated from Knightia excelsa, a nectar-producing Proteaceae tree endemic to Aotearoa (New Zealand), to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome reconstruction. Establishing a high-quality genome for this species has specific cultural importance to Māori and commercial importance to honey producers in Aotearoa. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Subsampling highlighted that input data with longer read lengths but perhaps lower coverage constructed more contiguous, kmers and gene-complete assemblies than short read length input data with higher coverage. The final genome assembly was constructed into 14 pseudochromosomes using an initial flye long read assembly, a racon/medaka/pilon combined polishing strategy, salsa2 and allhic scaffolding, juicebox curation, and Macadamia linkage map validation. We highlighted the importance of developing assembly workflows based on the volume and read length of sequencing data and established a robust set of quality metrics for generating high-quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by Hi-C data and that assembly scaffolding was more successful when the underlying contig assembly was of higher accuracy. These findings provide insight into how quality assessment tools can be implemented throughout genome assembly pipelines to inform the de novo reconstruction of a high-quality genome assembly for nonmodel organisms.
Collapse
|
78
|
Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. ACTA ACUST UNITED AC 2021; 70:e102. [PMID: 32559359 DOI: 10.1002/cpbi.102] [Citation(s) in RCA: 881] [Impact Index Per Article: 293.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.
Collapse
|
79
|
Geng L, Zou M, Jiang H, Meng M, Xu W. Draft Genome Assembly of the Aral Barbel Luciobarbus brachycephalus Using PacBio Sequencing. Genome Biol Evol 2021; 13:6320064. [PMID: 34255058 PMCID: PMC8489429 DOI: 10.1093/gbe/evab131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2021] [Indexed: 11/12/2022] Open
Abstract
The endangered Aral barbel Luciobarbus brachycephalus is endemic to the water systems of the Caspian Sea and Aral Sea. Given the scarcity of genetic data for the species, we present a draft assembly based on PacBio long-read sequencing technology. Approximate 299.4 Gb of long reads representing 166× of the estimated genome size were generated, and the final assembly was composed of 653 contigs totaling approximately 1,698.3 Mb, with a contig N50 length of 4.5 Mb. A total of 807.6 Mb represented approximately 47.6% of the assembly and were identified as repeats. Fifty-four thousand and six hundred possible protein genes were predicted, among which 50,727, representing approximately 92.9%, could be annotated by at least one database. Evolutionary analysis showed that L. brachycephalus and Labeo rohita diverged by approximately 42.6 Ma, and the obvious expansion of gene families residing in the L. brachycephalus genome may be attributed to the specific whole-genome duplication of the species. The first genome assembly of L. brachycephalus can not only provide a foundation for genetic conservation and molecular breeding of this species but also contribute to comparative analyses of genome biology and evolution within Cyprinidae.
Collapse
|
80
|
Bouchemousse S, Falquet L, Müller-Schärer H. Genome Assembly of the Ragweed Leaf Beetle: A Step Forward to Better Predict Rapid Evolution of a Weed Biocontrol Agent to Environmental Novelties. Genome Biol Evol 2021; 12:1167-1173. [PMID: 32428241 PMCID: PMC7486951 DOI: 10.1093/gbe/evaa102] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2020] [Indexed: 12/21/2022] Open
Abstract
Rapid evolution of weed biological control agents (BCAs) to new biotic and abiotic conditions is poorly understood and so far only little considered both in pre-release and post-release studies, despite potential major negative or positive implications for risks of nontargeted attacks or for colonizing yet unsuitable habitats, respectively. Provision of genetic resources, such as assembled and annotated genomes, is essential to assess potential adaptive processes by identifying underlying genetic mechanisms. Here, we provide the first sequenced genome of a phytophagous insect used as a BCA, that is, the leaf beetle Ophraella communa, a promising BCA of common ragweed, recently and accidentally introduced into Europe. A total 33.98 Gb of raw DNA sequences, representing ∼43-fold coverage, were obtained using the PacBio SMRT-Cell sequencing approach. Among the five different assemblers tested, the SMARTdenovo assembly displaying the best scores was then corrected with Illumina short reads. A final genome of 774 Mb containing 7,003 scaffolds was obtained. The reliability of the final assembly was then assessed by benchmarking universal single-copy orthologous genes (>96.0% of the 1,658 expected insect genes) and by remapping tests of Illumina short reads (average of 98.6 ± 0.7% without filtering). The number of protein-coding genes of 75,642, representing 82% of the published antennal transcriptome, and the phylogenetic analyses based on 825 orthologous genes placing O. communa in the monophyletic group of Chrysomelidae, confirm the relevance of our genome assembly. Overall, the genome provides a valuable resource for studying potential risks and benefits of this BCA facing environmental novelties.
Collapse
|
81
|
Danko D, Bezdan D, Afshin EE, Ahsanuddin S, Bhattacharya C, Butler DJ, Chng KR, Donnellan D, Hecht J, Jackson K, Kuchin K, Karasikov M, Lyons A, Mak L, Meleshko D, Mustafa H, Mutai B, Neches RY, Ng A, Nikolayeva O, Nikolayeva T, Png E, Ryon KA, Sanchez JL, Shaaban H, Sierra MA, Thomas D, Young B, Abudayyeh OO, Alicea J, Bhattacharyya M, Blekhman R, Castro-Nallar E, Cañas AM, Chatziefthimiou AD, Crawford RW, De Filippis F, Deng Y, Desnues C, Dias-Neto E, Dybwad M, Elhaik E, Ercolini D, Frolova A, Gankin D, Gootenberg JS, Graf AB, Green DC, Hajirasouliha I, Hastings JJA, Hernandez M, Iraola G, Jang S, Kahles A, Kelly FJ, Knights K, Kyrpides NC, Łabaj PP, Lee PKH, Leung MHY, Ljungdahl PO, Mason-Buck G, McGrath K, Meydan C, Mongodin EF, Moraes MO, Nagarajan N, Nieto-Caballero M, Noushmehr H, Oliveira M, Ossowski S, Osuolale OO, Özcan O, Paez-Espino D, Rascovan N, Richard H, Rätsch G, Schriml LM, Semmler T, Sezerman OU, Shi L, Shi T, Siam R, Song LH, Suzuki H, Court DS, Tighe SW, Tong X, Udekwu KI, Ugalde JA, Valentine B, Vassilev DI, Vayndorf EM, Velavan TP, Wu J, Zambrano MM, Zhu J, Zhu S, Mason CE. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell 2021; 184:3376-3393.e17. [PMID: 34043940 PMCID: PMC8238498 DOI: 10.1016/j.cell.2021.05.002] [Citation(s) in RCA: 129] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 03/05/2021] [Accepted: 04/29/2021] [Indexed: 01/14/2023]
Abstract
We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.
Collapse
|
82
|
Liu Y, Wu X, Wang Y. An integrated approach for copy number variation discovery in parent-offspring trios. Brief Bioinform 2021; 22:6306464. [PMID: 34151932 DOI: 10.1093/bib/bbab230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 04/27/2021] [Accepted: 05/25/2021] [Indexed: 11/14/2022] Open
Abstract
Whole-genome sequencing (WGS) of parent-offspring trios has become widely used to identify causal copy number variations (CNVs) in rare and complex diseases. Existing CNV detection approaches usually do not make effective use of Mendelian inheritance in parent-offspring trios and yield low accuracy. In this study, we propose a novel integrated approach, TrioCNV2, for jointly detecting CNVs from WGS data of the parent-offspring trio. TrioCNV2 first makes use of the read depth and discordant read pairs to infer approximate locations of CNVs and then employs the split read and local de novo assembly approaches to refine the breakpoints. We use the real WGS data of two parent-offspring trios to demonstrate TrioCNV2's performance and compare it with other CNV detection approaches. The software TrioCNV2 is implemented using a combination of Java and R and is freely available from the website at https://github.com/yongzhuang/TrioCNV2.
Collapse
|
83
|
Martínez Arbas S, Busi SB, Queirós P, de Nies L, Herold M, May P, Wilmes P, Muller EEL, Narayanasamy S. Challenges, Strategies, and Perspectives for Reference-Independent Longitudinal Multi-Omic Microbiome Studies. Front Genet 2021; 12:666244. [PMID: 34194470 PMCID: PMC8236828 DOI: 10.3389/fgene.2021.666244] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/30/2021] [Indexed: 12/21/2022] Open
Abstract
In recent years, multi-omic studies have enabled resolving community structure and interrogating community function of microbial communities. Simultaneous generation of metagenomic, metatranscriptomic, metaproteomic, and (meta) metabolomic data is more feasible than ever before, thus enabling in-depth assessment of community structure, function, and phenotype, thus resulting in a multitude of multi-omic microbiome datasets and the development of innovative methods to integrate and interrogate those multi-omic datasets. Specifically, the application of reference-independent approaches provides opportunities in identifying novel organisms and functions. At present, most of these large-scale multi-omic datasets stem from spatial sampling (e.g., water/soil microbiomes at several depths, microbiomes in/on different parts of the human anatomy) or case-control studies (e.g., cohorts of human microbiomes). We believe that longitudinal multi-omic microbiome datasets are the logical next step in microbiome studies due to their characteristic advantages in providing a better understanding of community dynamics, including: observation of trends, inference of causality, and ultimately, prediction of community behavior. Furthermore, the acquisition of complementary host-derived omics, environmental measurements, and suitable metadata will further enhance the aforementioned advantages of longitudinal data, which will serve as the basis to resolve drivers of community structure and function to understand the biotic and abiotic factors governing communities and specific populations. Carefully setup future experiments hold great potential to further unveil ecological mechanisms to evolution, microbe-microbe interactions, or microbe-host interactions. In this article, we discuss the challenges, emerging strategies, and best-practices applicable to longitudinal microbiome studies ranging from sampling, biomolecular extraction, systematic multi-omic measurements, reference-independent data integration, modeling, and validation.
Collapse
|
84
|
Lukicheva S, Flot JF, Mardulyn P. Genome Assembly of the Cold-Tolerant Leaf Beetle Gonioctena quinquepunctata, an Important Resource for Studying Its Evolution and Reproductive Barriers between Species. Genome Biol Evol 2021; 13:6296840. [PMID: 34115123 PMCID: PMC8290105 DOI: 10.1093/gbe/evab134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2021] [Indexed: 02/06/2023] Open
Abstract
Coleoptera is the most species-rich insect order, yet is currently underrepresented in genomic databases. An assembly was generated for ca. 1.7 Gb genome of the leaf beetle Gonioctena quinquepunctata by first assembling long-sequence reads (Oxford Nanopore; ± 27-fold coverage) and subsequently polishing the resulting assembly with short sequence reads (Illumina; ± 85-fold coverage). The unusually large size (most Coleoptera species are associated with a reported size below 1 Gb) was at least partially attributed to the presence of a large fraction of repeated elements (73.8%). The final assembly was characterized by an N50 length of 432 kb and a BUSCO score of 95.5%. The heterozygosity rate was ± 0.6%. Automated genome annotation informed by RNA-Seq resulted in 40,568 predicted proteins, which is much larger than the typical range 17,000–23,000 predicted for other Coleoptera. However, no evidence of a genome duplication was detected. This new reference genome will contribute to our understanding of genetic variation in the Coleoptera. Among others, it will also allow exploring reproductive barriers between species, investigating introgression in the nuclear genome, and identifying genes involved in resistance to extreme climate conditions.
Collapse
|
85
|
Liang KC, Sakakibara Y. MetaVelvet-DL: a MetaVelvet deep learning extension for de novo metagenome assembly. BMC Bioinformatics 2021; 22:427. [PMID: 34078257 PMCID: PMC8171044 DOI: 10.1186/s12859-020-03737-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 09/03/2020] [Indexed: 11/10/2022] Open
Abstract
Background The increasing use of whole metagenome sequencing has spurred the need to improve de novo assemblers to facilitate the discovery of unknown species and the analysis of their genomic functions. MetaVelvet-SL is a short-read de novo metagenome assembler that partitions a multi-species de Bruijn graph into single-species sub-graphs. This study aimed to improve the performance of MetaVelvet-SL by using a deep learning-based model to predict the partition nodes in a multi-species de Bruijn graph. Results This study showed that the recent advances in deep learning offer the opportunity to better exploit sequence information and differentiate genomes of different species in a metagenomic sample. We developed an extension to MetaVelvet-SL, which we named MetaVelvet-DL, that builds an end-to-end architecture using Convolutional Neural Network and Long Short-Term Memory units. The deep learning model in MetaVelvet-DL can more accurately predict how to partition a de Bruijn graph than the Support Vector Machine-based model in MetaVelvet-SL can. Assembly of the Critical Assessment of Metagenome Interpretation (CAMI) dataset showed that after removing chimeric assemblies, MetaVelvet-DL produced longer single-species contigs, with less misassembled contigs than MetaVelvet-SL did. Conclusions MetaVelvet-DL provides more accurate de novo assemblies of whole metagenome data. The authors believe that this improvement can help in furthering the understanding of microbiomes by providing a more accurate description of the metagenomic samples under analysis.
Collapse
|
86
|
Kajitani R, Yoshimura D, Ogura Y, Gotoh Y, Hayashi T, Itoh T. Platanus_B: an accurate de novo assembler for bacterial genomes using an iterative error-removal process. DNA Res 2021; 27:5870828. [PMID: 32658266 PMCID: PMC7433917 DOI: 10.1093/dnares/dsaa014] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/07/2020] [Indexed: 11/14/2022] Open
Abstract
De novo assembly of short DNA reads remains an essential technology, especially for large-scale projects and high-resolution variant analyses in epidemiology. However, the existing tools often lack sufficient accuracy required to compare closely related strains. To facilitate such studies on bacterial genomes, we developed Platanus_B, a de novo assembler that employs iterations of multiple error-removal algorithms. The benchmarks demonstrated the superior accuracy and high contiguity of Platanus_B, in addition to its ability to enhance the hybrid assembly of both short and nanopore long reads. Although the hybrid strategies for short and long reads were effective in achieving near full-length genomes, we found that short-read-only assemblies generated with Platanus_B were sufficient to obtain ≥90% of exact coding sequences in most cases. In addition, while nanopore long-read-only assemblies lacked fine-scale accuracies, inclusion of short reads was effective in improving the accuracies. Platanus_B can, therefore, be used for comprehensive genomic surveillances of bacterial pathogens and high-resolution phylogenomic analyses of a wide range of bacteria.
Collapse
|
87
|
Segawa T, Nishiyama C, Tamiru-Oli M, Sugihara Y, Abe A, Sone H, Itoh N, Asukai M, Uemura A, Oikawa K, Utsushi H, Ikegami-Katayama A, Imamura T, Mori M, Terauchi R, Takagi H. Sat-BSA: an NGS-based method using local de novo assembly of long reads for rapid identification of genomic structural variations associated with agronomic traits. BREEDING SCIENCE 2021; 71:299-312. [PMID: 34776737 PMCID: PMC8573553 DOI: 10.1270/jsbbs.20148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 01/18/2021] [Indexed: 05/29/2023]
Abstract
Advances in next generation sequencing (NGS)-based methodologies have accelerated the identifications of simple genetic variants such as point mutations and small insertions/deletions (InDels). Structural variants (SVs) including large InDels and rearrangements provide vital sources of genetic diversity for plant breeding. However, their analysis remains a challenge due to their complex nature. Consequently, novel NGS-based approaches are needed to rapidly and accurately identify SVs. Here, we present an NGS-based bulked-segregant analysis (BSA) technique called Sat-BSA (SVs associated with traits) for identifying SVs controlling traits of interest in crops. Sat-BSA targets allele frequencies at all SNP positions to first identify candidate genomic regions associated with a trait, which is then reconstructed by long reads-based local de novo assembly. Finally, the association between SVs, RNA-seq-based gene expression patterns and trait is evaluated for multiple cultivars to narrow down the candidate genes. We applied Sat-BSA to segregating F2 progeny obtained from crosses between turnip cultivars with different tuber colors and successfully isolated two genes harboring SVs that are responsible for tuber phenotypes. The current study demonstrates the utility of Sat-BSA for the identification of SVs associated with traits of interest in species with large and heterozygous genomes.
Collapse
|
88
|
Kohli S, Gulati P, Narang A, Maini J, Shamsudheen KV, Pandey R, Scaria V, Sivasubbu S, Brahmachari V. Genome and transcriptome analysis of the mealybug Maconellicoccus hirsutus: Correlation with its unique phenotypes. Genomics 2021; 113:2483-2494. [PMID: 34022346 DOI: 10.1016/j.ygeno.2021.05.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Revised: 04/02/2021] [Accepted: 05/17/2021] [Indexed: 11/27/2022]
Abstract
Mealybugs are aggressive pests with world-wide distribution and are suitable for the study of different phenomena like genomic imprinting and epigenetics. Genomic approaches facilitate these studies in absence of robust genetics in this system. We sequenced, de novo assembled, annotated Maconellicoccus hirsutus genome. We carried out comparative genomics it with four mealybug and eight other insect species, to identify expanded, specific and contracted gene classes that relate to pesticide and desiccation resistance. We identified horizontally transferred genes adding to the mutualism between the mealybug and its endosymbionts. Male and female transcriptome analysis indicates differential expression of metabolic pathway genes correlating with their physiology and the genes for sexual dimorphism. The significantly lower expression of endosymbiont genes in males relates to the depletion of endosymbionts in males during development.
Collapse
|
89
|
Al Qaffas A, Nichols J, Davison AJ, Ourahmane A, Hertel L, McVoy MA, Camiolo S. LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data. Virus Evol 2021; 7:veab042. [PMID: 33996146 PMCID: PMC8111061 DOI: 10.1093/ve/veab042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads generated on long-read platforms make the tools used for assembling short reads unsuitable for use in data assembly and motivate the development of new approaches. We present LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads. The tool was designed to deal with reads originating from viral genomes, which feature high genetic variability, possible multiple isoforms, and the dominant presence of additional organisms in clinical or environmental samples. LoReTTA was tested on a range of simulated and experimental datasets and outperformed established long-read assemblers in terms of assembly contiguity and accuracy. The software runs under the Linux operating system, is designed for easy adaptation to alternative systems, and features an automatic installation pipeline that takes care of the required dependencies. A command-line version and a user-friendly graphical interface version are available under a GPLv3 license at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.
Collapse
|
90
|
Islam R, Raju RS, Tasnim N, Shihab IH, Bhuiyan MA, Araf Y, Islam T. Choice of assemblers has a critical impact on de novo assembly of SARS-CoV-2 genome and characterizing variants. Brief Bioinform 2021; 22:6210065. [PMID: 33822878 PMCID: PMC8083570 DOI: 10.1093/bib/bbab102] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/06/2021] [Accepted: 03/08/2021] [Indexed: 12/18/2022] Open
Abstract
Background Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global pandemic following its initial emergence in China. SARS-CoV-2 has a positive-sense single-stranded RNA virus genome of around 30Kb. Using next-generation sequencing technologies, a large number of SARS-CoV-2 genomes are being sequenced at an unprecedented rate and being deposited in public repositories. For the de novo assembly of the SARS-CoV-2 genomes, a myriad of assemblers is being used, although their impact on the assembly quality has not been characterized for this virus. In this study, we aim to understand the variabilities on assembly qualities due to the choice of the assemblers. Results We performed 6648 de novo assemblies of 416 SARS-CoV-2 samples using eight different assemblers with different k-mer lengths. We used Illumina paired-end sequencing reads and compared the assembly quality of those assemblers. We showed that the choice of assembler plays a significant role in reconstructing the SARS-CoV-2 genome. Two metagenomic assemblers, e.g. MEGAHIT and metaSPAdes, performed better compared with others in most of the assembly quality metrics including, recovery of a larger fraction of the genome, constructing larger contigs and higher N50, NA50 values, etc. We showed that at least 09% (259/2873) of the variants present in the assemblies between MEGAHIT and metaSPAdes are unique to one of the assembly methods. Conclusion Our analyses indicate the critical role of assembly methods for assembling SARS-CoV-2 genome using short reads and their impact on variant characterization. This study could help guide future studies to determine the best-suited assembler for the de novo assembly of virus genomes.
Collapse
|
91
|
Mishra A, Singh A, Mantri S, Pandey AK, Garg M, Deshmukh R, Sonah H, Kandoth PK, Sharma TR, Roy J. Decoding the genome of superior chapatti quality Indian wheat variety 'C 306' unravelled novel genomic variants for chapatti and nutrition quality related genes. Genomics 2021; 113:1919-1929. [PMID: 33823224 DOI: 10.1016/j.ygeno.2021.03.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 03/18/2021] [Accepted: 03/29/2021] [Indexed: 11/26/2022]
Abstract
An Indian wheat variety, 'C 306' has good chapatti quality, which is controlled by multiple genes that have not been explored. We report the high quality de novo assembled genome of 'C 306' by combining short and long read sequencing data. The hybrid assembly covered 93% of gene space and identified about 142 K coding genes, 34% repetitive DNA and ~ 501 K SSR motifs. The phylogenetic analysis of about 83 K orthologous protein groups suggested the closest relationship with T. turgidum, T. aestivum and Ae. tauschii. Genome wide analysis annotated 69,217,536 genomic variants. Out of them, 1423 missense and 117 deleterious variants identified in processing, nutrition, and chapatti quality related genes such as alpha- and beta-gliadin, SSI, SSIII, SUT1, SBEI, CHS, YSL, DMAS, and NAS encoded proteins. These variants may affect quality genes. The genomic data will be potential genomic resources in wheat breeding programs for quality improvement.
Collapse
|
92
|
Liu Y, Helmann TC, Stodghill P, Filiatrault MJ. Complete Genome Sequence Resource for the Necrotrophic Plant-Pathogenic Bacterium Dickeya dianthicola 67-19 Isolated From New Guinea Impatiens. PLANT DISEASE 2021; 105:1174-1176. [PMID: 33064625 DOI: 10.1094/pdis-09-20-1968-a] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
New Guinea impatiens (NGI, Impatiens hawkeri) are popular bedding plants that can be affected by a number of pathogens. Using 16S rDNA sequencing and genus-specific PCR, we identified the first Dickeya dianthicola strain isolated from NGI presented with blackleg symptoms, herein designated as D. dianthicola 67-19. Here, we report a high-quality complete and annotated genome sequence of D. dianthicola 67-19. The 4,851,809 bp genome was assembled with Nanopore reads and polished with Illumina reads, yielding 422× and 105× coverage, respectively. This closed genome provides a resource for future research on comparative genomics and biology of D. dianthicola, which could translate to improved detection and disease management.
Collapse
|
93
|
Harper JR, Sripada N, Kher P, Whittall JB, Edgerly JS. Interpreting nature's finest insect silks (Order Embioptera): hydropathy, interrupted repetitive motifs, and fiber-to-film transformation for two neotropical species. ZOOLOGY 2021; 146:125923. [PMID: 33901836 DOI: 10.1016/j.zool.2021.125923] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 03/14/2021] [Accepted: 03/19/2021] [Indexed: 10/21/2022]
Abstract
Silks produced by webspinners (Order Embioptera) interact with water by transforming from fiber to film, which then becomes slippery and capable of shedding water. We chose to explore this mechanism by analyzing and comparing the silk protein transcripts of two species with overlapping distributions in Trinidad but from different taxonomic families. The transcript of one, Antipaluria urichi (Clothodidae), was partially characterized in 2009 providing a control for our methods to characterize a second species: Pararhagadochir trinitatis (Scelembiidae), a family that adds to the taxon sampling for this little known order of insects. Previous reports showed that embiopteran silk protein (dubbed Efibroin) consists of a protein core of repetitive motifs largely composed of glycine (Gly), serine (Ser), and alanine (Ala) and a highly conserved C-terminal region. Based on mRNA extracted from silk glands, Next Generation sequencing, and de novo assembly, P. trinitatis silk can be characterized by repetitive motifs of Gly-Ser followed periodically by Gly-Asparagine (Asn-an unusual amino acid for Efibroins) and by a lack of Ala which is otherwise common in Efibroins. The putative N-terminal domain, composed mostly of polar, charged and bulky amino acids, is ten amino acids long with cysteine in the 10th position-a feature likely related to stabilization of the silk fibers. The 29 amino acids of the C-terminus for P. trinitatis silk closely resemble that of other Efibroin sequences, which show 74% shared identity on average. Examination of hydropathicity of Efibroins of both P. trinitatis and An. urichi revealed that these proteins are largely hydrophilic despite having a thin lipid coating on each nano-fiber. We deduced that the hydrophilic quality differs for the two species: due to Ser and Asn for P. trinitatis silk and to previously undetected spacers in An. urichi silk. Spacers are known from some spider and silkworm silks but this is the first report of such for Embioptera. Analysis of hydropathicity revealed the largely hydrophilic quality of these silks and this feature likely explains why water causes the transformation from fiber to film. We compared spun silk to the transcript and detected not insignificant differences between the two measurements implying that as yet undetermined post-translational modifications of their silk may occur. In addition, we found evidence for codon bias in the nucleotides of the putative silk transcript for P. trinitatis, a feature also known for other embiopteran silk genes.
Collapse
|
94
|
Prall TM, Neumann EK, Karl JA, Shortreed CG, Baker DA, Bussan HE, Wiseman RW, O'Connor DH. Consistent ultra-long DNA sequencing with automated slow pipetting. BMC Genomics 2021; 22:182. [PMID: 33711930 DOI: 10.1186/s12864-021-07500-w/figures/4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 03/02/2021] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND Oxford Nanopore Technologies' instruments can sequence reads of great length. Long reads improve sequence assemblies by unambiguously spanning repetitive elements of the genome. Sequencing reads of significant length requires the preservation of long DNA template molecules through library preparation by pipetting reagents as slowly as possible to minimize shearing. This process is time-consuming and inconsistent at preserving read length as even small changes in volumetric flow rate can result in template shearing. RESULTS We have designed SNAILS (Slow Nucleic Acid Instrument for Long Sequences), a 3D-printable instrument that automates slow pipetting of reagents used in long read library preparation for Oxford Nanopore sequencing. Across six sequencing libraries, SNAILS preserved more reads exceeding 100 kilobases in length and increased its libraries' average read length over manual slow pipetting. CONCLUSIONS SNAILS is a low-cost, easily deployable solution for improving sequencing projects that require reads of significant length. By automating the slow pipetting of library preparation reagents, SNAILS increases the consistency and throughput of long read Nanopore sequencing.
Collapse
|
95
|
Prall TM, Neumann EK, Karl JA, Shortreed CG, Baker DA, Bussan HE, Wiseman RW, O'Connor DH. Consistent ultra-long DNA sequencing with automated slow pipetting. BMC Genomics 2021; 22:182. [PMID: 33711930 PMCID: PMC7953553 DOI: 10.1186/s12864-021-07500-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 03/02/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Oxford Nanopore Technologies' instruments can sequence reads of great length. Long reads improve sequence assemblies by unambiguously spanning repetitive elements of the genome. Sequencing reads of significant length requires the preservation of long DNA template molecules through library preparation by pipetting reagents as slowly as possible to minimize shearing. This process is time-consuming and inconsistent at preserving read length as even small changes in volumetric flow rate can result in template shearing. RESULTS We have designed SNAILS (Slow Nucleic Acid Instrument for Long Sequences), a 3D-printable instrument that automates slow pipetting of reagents used in long read library preparation for Oxford Nanopore sequencing. Across six sequencing libraries, SNAILS preserved more reads exceeding 100 kilobases in length and increased its libraries' average read length over manual slow pipetting. CONCLUSIONS SNAILS is a low-cost, easily deployable solution for improving sequencing projects that require reads of significant length. By automating the slow pipetting of library preparation reagents, SNAILS increases the consistency and throughput of long read Nanopore sequencing.
Collapse
|
96
|
Bally ISE, Bombarely A, Chambers AH, Cohen Y, Dillon NL, Innes DJ, Islas-Osuna MA, Kuhn DN, Mueller LA, Ophir R, Rambani A, Sherman A, Yan H. The 'Tommy Atkins' mango genome reveals candidate genes for fruit quality. BMC PLANT BIOLOGY 2021; 21:108. [PMID: 33618672 PMCID: PMC7898432 DOI: 10.1186/s12870-021-02858-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 01/28/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Mango, Mangifera indica L., an important tropical fruit crop, is grown for its sweet and aromatic fruits. Past improvement of this species has predominantly relied on chance seedlings derived from over 1000 cultivars in the Indian sub-continent with a large variation for fruit size, yield, biotic and abiotic stress resistance, and fruit quality among other traits. Historically, mango has been an orphan crop with very limited molecular information. Only recently have molecular and genomics-based analyses enabled the creation of linkage maps, transcriptomes, and diversity analysis of large collections. Additionally, the combined analysis of genomic and phenotypic information is poised to improve mango breeding efficiency. RESULTS This study sequenced, de novo assembled, analyzed, and annotated the genome of the monoembryonic mango cultivar 'Tommy Atkins'. The draft genome sequence was generated using NRGene de-novo Magic on high molecular weight DNA of 'Tommy Atkins', supplemented by 10X Genomics long read sequencing to improve the initial assembly. A hybrid population between 'Tommy Atkins' x 'Kensington Pride' was used to generate phased haplotype chromosomes and a highly resolved phased SNP map. The final 'Tommy Atkins' genome assembly was a consensus sequence that included 20 pseudomolecules representing the 20 chromosomes of mango and included ~ 86% of the ~ 439 Mb haploid mango genome. Skim sequencing identified ~ 3.3 M SNPs using the 'Tommy Atkins' x 'Kensington Pride' mapping population. Repeat masking identified 26,616 genes with a median length of 3348 bp. A whole genome duplication analysis revealed an ancestral 65 MYA polyploidization event shared with Anacardium occidentale. Two regions, one on LG4 and one on LG7 containing 28 candidate genes, were associated with the commercially important fruit size characteristic in the mapping population. CONCLUSIONS The availability of the complete 'Tommy Atkins' mango genome will aid global initiatives to study mango genetics.
Collapse
|
97
|
Cui Y, Wu B, Peng A, Song X, Chen X. The Genome of Banana Leaf Blight Pathogen Fusarium sacchari str. FS66 Harbors Widespread Gene Transfer From Fusarium oxysporum. FRONTIERS IN PLANT SCIENCE 2021; 12:629859. [PMID: 33613610 PMCID: PMC7889605 DOI: 10.3389/fpls.2021.629859] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 01/12/2021] [Indexed: 06/12/2023]
Abstract
Fusarium species have been identified as pathogens causing many different plant diseases, and here we report an emerging banana leaf blight (BLB) caused by F. sacchari (Fs) discovered in Guangdong, China. From the symptomatic tissues collected in the field, a fungal isolate was obtained, which induced similar symptoms on healthy banana seedlings after inoculation. Koch's postulates were fulfilled after the re-isolation of the pathogen. Phylogenetic analysis on two gene segments and the whole genome sequence identified the pathogen belonging to Fs and named as Fs str. FS66. A 45.74 Mb genome of FS66 was acquired through de novo assembly using long-read sequencing data, and its contig N50 (1.97 Mb) is more than 10-fold larger than the previously available genome in the species. Based on transcriptome sequencing and ab initio gene annotation, a total of 14,486 protein-encoding genes and 418 non-coding RNAs were predicted. A total of 48 metabolite biosynthetic gene clusters including the fusaric acid biosynthesis gene cluster were predicted in silico in the FS66 genome. Comparison between FS66 and other 11 Fusarium genomes identified tens to hundreds of genes specifically gained and lost in FS66, including some previously correlated with Fusarium pathogenicity. The FS66 genome also harbors widespread gene transfer on the core chromosomes putatively from F. oxysporum species complex (FOSC), including 30 involved in Fusarium pathogenicity/virulence. This study not only reports the BLB caused by Fs, but also provides important information and clues for further understanding of the genome evolution among pathogenic Fusarium species.
Collapse
|
98
|
Zhao L, Xu S, Han Z, Liu Q, Ke W, Liu A, Gao T. Chromosome-Level Genome Assembly and Annotation of a Sciaenid Fish, Argyrosomus japonicus. Genome Biol Evol 2021; 13:evaa246. [PMID: 33484557 PMCID: PMC7874996 DOI: 10.1093/gbe/evaa246] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2020] [Indexed: 11/22/2022] Open
Abstract
Argyrosomus japonicus is an economically and ecologically important fish species in the family Sciaenidae with a wide distribution in the world's oceans. Here, we report a high-quality, chromosome-level genome assembly of A. japonicus based on PacBio and Hi-C sequencing technology. A 673.7-Mb genome containing 282 contigs with an N50 length of 18.4 Mb was obtained based on PacBio long reads. These contigs were further ordered and clustered into 24 chromosome groups based on Hi-C data. In addition, a total of 217.2 Mb (32.24% of the assembled genome) of sequences were identified as repeat elements, and 23,730 protein-coding genes were predicted based on multiple approaches. More than 97% of BUSCO genes were identified in the A. japonicus genome. The high-quality genome assembled in this work not only provides a valuable genomic resource for future population genetics, conservation biology and selective breeding studies of A. japonicus but also lays a solid foundation for the study of Sciaenidae evolution.
Collapse
|
99
|
Mhuantong W, Charoensri S, Poonsrisawat A, Pootakham W, Tangphatsornruang S, Siamphan C, Suwannarangsee S, Eurwilaichitr L, Champreda V, Charoensawan V, Chantasingh D. High Quality Aspergillus aculeatus Genomes and Transcriptomes: A Platform for Cellulase Activity Optimization Toward Industrial Applications. Front Bioeng Biotechnol 2021; 8:607176. [PMID: 33585410 PMCID: PMC7873481 DOI: 10.3389/fbioe.2020.607176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/31/2020] [Indexed: 11/24/2022] Open
|
100
|
Iannucci A, Makunin AI, Lisachov AP, Ciofi C, Stanyon R, Svartman M, Trifonov VA. Bridging the Gap between Vertebrate Cytogenetics and Genomics with Single-Chromosome Sequencing (ChromSeq). Genes (Basel) 2021; 12:124. [PMID: 33478118 PMCID: PMC7835784 DOI: 10.3390/genes12010124] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/10/2021] [Accepted: 01/15/2021] [Indexed: 01/23/2023] Open
Abstract
The study of vertebrate genome evolution is currently facing a revolution, brought about by next generation sequencing technologies that allow researchers to produce nearly complete and error-free genome assemblies. Novel approaches however do not always provide a direct link with information on vertebrate genome evolution gained from cytogenetic approaches. It is useful to preserve and link cytogenetic data with novel genomic discoveries. Sequencing of DNA from single isolated chromosomes (ChromSeq) is an elegant approach to determine the chromosome content and assign genome assemblies to chromosomes, thus bridging the gap between cytogenetics and genomics. The aim of this paper is to describe how ChromSeq can support the study of vertebrate genome evolution and how it can help link cytogenetic and genomic data. We show key examples of ChromSeq application in the refinement of vertebrate genome assemblies and in the study of vertebrate chromosome and karyotype evolution. We also provide a general overview of the approach and a concrete example of genome refinement using this method in the species Anolis carolinensis.
Collapse
|