1
|
Cleal K, Baird DM. Dysgu: efficient structural variant calling using short or long reads. Nucleic Acids Res 2022; 50:e53. [PMID: 35100420 PMCID: PMC9122538 DOI: 10.1093/nar/gkac039] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 06/03/2021] [Revised: 12/20/2021] [Accepted: 01/24/2022] [Indexed: 12/27/2022] Open
Abstract
Structural variation (SV) plays a fundamental role in genome evolution and can underlie inherited or acquired diseases such as cancer. Long-read sequencing technologies have led to improvements in the characterization of structural variants (SVs), although paired-end sequencing offers better scalability. Here, we present dysgu, which calls SVs or indels using paired-end or long reads. Dysgu detects signals from alignment gaps, discordant and supplementary mappings, and generates consensus contigs, before classifying events using machine learning. Additional SVs are identified by remapping of anomalous sequences. Dysgu outperforms existing state-of-the-art tools using paired-end or long-reads, offering high sensitivity and precision whilst being among the fastest tools to run. We find that combining low coverage paired-end and long-reads is competitive in terms of performance with long-reads at higher coverage values.
Collapse
Affiliation(s)
- Kez Cleal
- Division of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Duncan M Baird
- Division of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| |
Collapse
|
2
|
Alanko J, Alipanahi B, Settle J, Boucher C, Gagie T. Buffering updates enables efficient dynamic de Bruijn graphs. Comput Struct Biotechnol J 2021; 19:4067-4078. [PMID: 34377371 PMCID: PMC8326735 DOI: 10.1016/j.csbj.2021.06.047] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 03/15/2021] [Revised: 06/29/2021] [Accepted: 06/29/2021] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection (Alipanahi et al., 2020b; Iqbal et al., 2012), and storage of assembled genomes (Chikhi et al., 2016). For this reason, there have been over a dozen methods for building and representing the de Bruijn graph and its variants in a space and time efficient manner. RESULTS With the exception of a few data structures (Muggli et al., 2019; Holley and Melsted, 2020; Crawford et al.,2018), compressed and compact de Bruijn graphs do not allow for the graph to be efficiently updated, meaning that data can be added or deleted. The most recent compressed dynamic de Bruijn graph (Alipanahi et al., 2020a), relies on dynamic bit vectors which are slow in theory and practice. To address this shortcoming, we present a compressed dynamic de Bruijn graph that removes the necessity of dynamic bit vectors by buffering data that should be added or removed from the graph. We implement our method, which we refer to as BufBOSS, and compare its performance to Bifrost, DynamicBOSS, and FDBG. Our experiments demonstrate that BufBOSS achieves attractive trade-offs compared to other tools in terms of time, memory and disk, and has the best deletion performance by an order of magnitude.
Collapse
Affiliation(s)
- Jarno Alanko
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
| | - Bahar Alipanahi
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Jonathen Settle
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Travis Gagie
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
3
|
Wu CS, Sudianto E, Hung YM, Wang BC, Huang CJ, Chen CT, Chaw SM. Genome skimming and exploration of DNA barcodes for Taiwan endemic cypresses. Sci Rep 2020; 10:20650. [PMID: 33244113 PMCID: PMC7693304 DOI: 10.1038/s41598-020-77492-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 03/29/2020] [Accepted: 09/16/2020] [Indexed: 11/23/2022] Open
Abstract
Cypresses are characterized by their longevity and valuable timber. In Taiwan, two endemic cypress species, Chamaecyparis formosensis and C. obtusa var. formosana, are threatened by prevalent illegal logging. A DNA barcode system is urgently needed for reforestation and conservation of these two cypresses. In this study, both plastomes and 35S rDNAs from 16, 10, and 6 individuals of C. formosensis, C. obtusa var. formosana, and C. obtusa var. obtusa were sequenced, respectively. We show that the loss of plastid trnT-GGU readily distinguishes C. formosensis from its congeneric species. We demonstrate that entire sequences of plastomes or 35S rDNAs are capable of correctly identifying cypress species and varieties, suggesting that they are effective super-barcodes. We also discover three short hypervariable loci (i.e., 3′ETS, ITS1, and trnH-psbA) that are promising barcodes for identifying cypress species and varieties. Moreover, nine species-specific indels of > 100 bp were detected in the cypress plastomes. These indels, together with the three aforementioned short barcodes, constitute an alternative and powerful barcode system crucial for identifying specimens that are fragmentary or contain degraded/poor DNA. Our sequenced data and barcode systems not only enrich the genetic reference for cypresses, but also contribute to future reforestation, conservation, and forensic investigations.
Collapse
Affiliation(s)
- Chung-Shien Wu
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Edi Sudianto
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Yu-Mei Hung
- Department of Forensic Science Investigation Bureau, Ministry of Justice, New Taipei City, 231209, Taiwan
| | - Bo-Cyun Wang
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Chiun-Jr Huang
- School of Forestry and Resource Conservation, National Taiwan University, Taipei, 10617, Taiwan
| | - Chi-Tsong Chen
- Department of Forensic Science Investigation Bureau, Ministry of Justice, New Taipei City, 231209, Taiwan.
| | - Shu-Miaw Chaw
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan.
| |
Collapse
|
4
|
Ebrahimpour Boroojeny A, Shrestha A, Sharifi-Zarchi A, Gallagher SR, Sahinalp SC, Chitsaz H. Graph Traversal Edit Distance and Extensions. J Comput Biol 2020; 27:317-329. [PMID: 32058803 DOI: 10.1089/cmb.2019.0511] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/16/2022] Open
Abstract
Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.
Collapse
Affiliation(s)
| | - Akash Shrestha
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| | - Ali Sharifi-Zarchi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | | | | | - Hamidreza Chitsaz
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| |
Collapse
|
5
|
Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0181-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/25/2022]
|
6
|
Deciphering the mitochondrial genome of Malabar snakehead, Channa diplogramma (Teleostei; Channidae). Biologia (Bratisl) 2019. [DOI: 10.2478/s11756-019-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/20/2022]
|
7
|
Mittal P, Jaiswal SK, Vijay N, Saxena R, Sharma VK. Comparative analysis of corrected tiger genome provides clues to its neuronal evolution. Sci Rep 2019; 9:18459. [PMID: 31804567 PMCID: PMC6895189 DOI: 10.1038/s41598-019-54838-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 03/14/2019] [Accepted: 11/14/2019] [Indexed: 01/01/2023] Open
Abstract
The availability of completed and draft genome assemblies of tiger, leopard, and other felids provides an opportunity to gain comparative insights on their unique evolutionary adaptations. However, genome-wide comparative analyses are susceptible to errors in genome sequences and thus require accurate genome assemblies for reliable evolutionary insights. In this study, while analyzing the tiger genome, we found almost one million erroneous substitutions in the coding and non-coding region of the genome affecting 4,472 genes, hence, biasing the current understanding of tiger evolution. Moreover, these errors produced several misleading observations in previous studies. Thus, to gain insights into the tiger evolution, we corrected the erroneous bases in the genome assembly and gene set of tiger using ‘SeqBug’ approach developed in this study. We sequenced the first Bengal tiger genome and transcriptome from India to validate these corrections. A comprehensive evolutionary analysis was performed using 10,920 orthologs from nine mammalian species including the corrected gene sets of tiger and leopard and using five different methods at three hierarchical levels, i.e. felids, Panthera, and tiger. The unique genetic changes in tiger revealed that the genes showing signatures of adaptation in tiger were enriched in development and neuronal functioning. Specifically, the genes belonging to the Notch signalling pathway, which is among the most conserved pathways involved in embryonic and neuronal development, were found to have significantly diverged in tiger in comparison to the other mammals. Our findings suggest the role of adaptive evolution in neuronal functions and development processes, which correlates well with the presence of exceptional traits such as sensory perception, strong neuro-muscular coordination, and hypercarnivorous behaviour in tiger.
Collapse
Affiliation(s)
- Parul Mittal
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Shubham K Jaiswal
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Rituja Saxena
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India
| | - Vineet K Sharma
- Metaomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal, India.
| |
Collapse
|
8
|
Sritha K, Augustine J, Bhat SG. Draft genome sequence data of T-5 like Salmonella bacteriophage ФSP3 with demonstrated therapeutic potential. Data Brief 2019; 27:104606. [PMID: 31667319 PMCID: PMC6812015 DOI: 10.1016/j.dib.2019.104606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 12/17/2018] [Revised: 09/16/2019] [Accepted: 09/26/2019] [Indexed: 11/17/2022] Open
|
9
|
Sathyajith C, Yamanoue Y, Yokobori SI, Thampy S, Vattiringal Jayadradhan RK. Mitogenome analysis of dwarf pufferfish ( Carinotetraodon travancoricus) endemic to southwest India and its implications in the phylogeny of Tetraodontidae. J Genet 2019; 98:105. [PMID: 31819020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 06/10/2023]
Abstract
The Tetraodontidae (pufferfishes), is primarily a family of marine and estuarine fishes with a limited number of freshwater species. Freshwater invasions can be observed in South America, Southeast Asia and central Africa. In the present study, we have analysed the complete mitogenome of freshwater pufferfish, Carinotetraodon travancoricus (dwarf pufferfish or Malabar pufferfish) endemic to southwest India. The genome is 16487 bp in length and consist of 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes and one control region like all the other vertebrate mitogenomes. The protein-coding genes ranged from 165 bp (ATP synthase subunit 8) to 1812 bp (NADH dehydrogenase subunit 5) and comprised of 11310 bp in total, constituting 68.5% of the complete mitogenome. Some overlaps have been observed in protein-coding genes by a total of 7 bp. The AT skew (0.032166) and GC skew (-0.29746) of the mitogenome indicated that heavy strand consists equal amount of A and T, but the overall base composition was mainly C skewed. The noncoding D-loop region comprised 869 bp. The conserved motifs ATGTA and its complement TACAT associated with thermostable hairpin structure formation were identified in the control region. The phylogenetic analysis depicted a sister group relationship of C. travancoricus with euryhaline species Dichotomyctere nigroviridis and D. ocellatus with 100% bootstrap value rather than with the other freshwater members of Carinotetraodon species from Southeast Asia. The data from this study will be useful for proper identification, genetic differentiation, management and conservation of the dwarf Indian pufferfish.
Collapse
Affiliation(s)
- Chandhini Sathyajith
- Department of Aquaculture, Kerala University of Fisheries and Ocean Studies, Panangad 682 506, India.
| | | | | | | | | |
Collapse
|
10
|
Sathyajith C, Yamanoue Y, Yokobori SI, Thampy S, Vattiringal Jayadradhan RK. Mitogenome analysis of dwarf pufferfish (Carinotetraodon travancoricus) endemic to southwest India and its implications in the phylogeny of Tetraodontidae. J Genet 2019. [DOI: 10.1007/s12041-019-1151-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 10/25/2022]
|
11
|
Yang R, Santos Garcia D, Pérez Montaño F, da Silva GM, Zhao M, Jiménez Guerrero I, Rosenberg T, Chen G, Plaschkes I, Morin S, Walcott R, Burdman S. Complete Assembly of the Genome of an Acidovorax citrulli Strain Reveals a Naturally Occurring Plasmid in This Species. Front Microbiol 2019; 10:1400. [PMID: 31281298 PMCID: PMC6595937 DOI: 10.3389/fmicb.2019.01400] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 08/30/2018] [Accepted: 06/04/2019] [Indexed: 11/13/2022] Open
Abstract
Acidovorax citrulli is the causal agent of bacterial fruit blotch (BFB), a serious threat to cucurbit crop production worldwide. Based on genetic and phenotypic properties, A. citrulli strains are divided into two major groups: group I strains have been generally isolated from melon and other non-watermelon cucurbits, while group II strains are closely associated with watermelon. In a previous study, we reported the genome of the group I model strain, M6. At that time, the M6 genome was sequenced by MiSeq Illumina technology, with reads assembled into 139 contigs. Here, we report the assembly of the M6 genome following sequencing with PacBio technology. This approach not only allowed full assembly of the M6 genome, but it also revealed the occurrence of a ∼53 kb plasmid. The M6 plasmid, named pACM6, was further confirmed by plasmid extraction, Southern-blot analysis of restricted fragments and obtention of M6-derivative cured strains. pACM6 occurs at low copy numbers (average of ∼4.1 ± 1.3 chromosome equivalents) in A. citrulli M6 and contains 63 open reading frames (ORFs), most of which (55.6%) encoding hypothetical proteins. The plasmid contains several genes encoding type IV secretion components, and typical plasmid-borne genes involved in plasmid maintenance, replication and transfer. The plasmid also carries an operon encoding homologs of a Fic-VbhA toxin-antitoxin (TA) module. Transcriptome data from A. citrulli M6 revealed that, under the tested conditions, the genes encoding the components of this TA system are among the highest expressed genes in pACM6. Whether this TA module plays a role in pACM6 maintenance is still to be determined. Leaf infiltration and seed transmission assays revealed that, under tested conditions, the loss of pACM6 did not affect the virulence of A. citrulli M6. We also show that pACM6 or similar plasmids are present in several group I strains, but absent in all tested group II strains of A. citrulli.
Collapse
Affiliation(s)
- Rongzhi Yang
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Diego Santos Garcia
- Department of Entomology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Francisco Pérez Montaño
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.,Department of Microbiology, University of Seville, Seville, Spain
| | - Gustavo Mateus da Silva
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Mei Zhao
- Department of Plant Pathology, University of Georgia, Athens, GA, United States
| | - Irene Jiménez Guerrero
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Tally Rosenberg
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Gong Chen
- Department of Plant Pathology, University of Georgia, Athens, GA, United States
| | - Inbar Plaschkes
- Bioinformatics Unit, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Shai Morin
- Department of Entomology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Ron Walcott
- Department of Plant Pathology, University of Georgia, Athens, GA, United States
| | - Saul Burdman
- Department of Plant Pathology and Microbiology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| |
Collapse
|
12
|
Elayadeth‐Meethal M, Thazhathu Veettil A, Maloney SK, Hawkins N, Misselbrook TH, Sejian V, Rivero MJ, Lee MRF. Size does matter: Parallel evolution of adaptive thermal tolerance and body size facilitates adaptation to climate change in domestic cattle. Ecol Evol 2018; 8:10608-10620. [PMID: 30464832 PMCID: PMC6238145 DOI: 10.1002/ece3.4550] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 05/29/2018] [Revised: 08/01/2018] [Accepted: 08/04/2018] [Indexed: 01/18/2023] Open
Abstract
The adaptive potential of livestock under a warming climate is increasingly relevant in relation to the growing pressure of global food security. Studies on heat tolerance demonstrate the interplay of adaptation and acclimatization in functional traits, for example, a reduction in body size and enhanced tolerance in response to a warming climate. However, current lack of understanding of functional traits and phylogenetic history among phenotypically distinct populations constrains predictions of climate change impact. Here, we demonstrate evidence of parallel evolution in adaptive tolerance to heat stress in dwarf cattle breeds (DCB, Bos taurus indicus) and compare their thermoregulatory responses with those in standard size cattle breeds (SCB, crossbred, Bos taurus indicus × Bos taurus taurus). We measured vital physiological, hematological, biochemical, and gene expression changes in DCB and SCB and compared the molecular phylogeny using mitochondrial genome (mitogenome) analysis. Our results show that SCB can acclimatize in the short term to higher temperatures but reach their tolerance limit under prevailing tropical conditions, while DCB is adapted to the warmer climate. Increased hemoglobin concentration, reduced cellular size, and smaller body size enhance thermal tolerance. Mitogenome analysis revealed that different lineages of DCB have evolved reduced size independently, as a parallel adaptation to heat stress. The results illustrate mechanistic ways of dwarfing, body size-dependent tolerance, and differential fitness in a large mammal species under harsh field conditions, providing a background for comparing similar populations during global climate change. These demonstrate the value of studies combining functional, physiological, and evolutionary approaches to delineate adaptive potential and plasticity in domestic species. We thus highlight the value of locally adapted breeds as a reservoir of genetic variation contributing to the global domestic genetic resource pool that will become increasingly important for livestock production systems under a warming climate.
Collapse
Affiliation(s)
- Muhammed Elayadeth‐Meethal
- Kerala Veterinary and Animal Sciences UniversityWayanadIndia
- School of Human SciencesUniversity of Western AustraliaCrawleyAustralia
- Rothamsted ResearchNorth WykeUK
| | | | - Shane K. Maloney
- School of Human SciencesUniversity of Western AustraliaCrawleyAustralia
| | | | | | - Veerasamy Sejian
- ICAR‐National Institute of Animal Nutrition and PhysiologyBangaloreIndia
| | | | - Michael R. F. Lee
- Rothamsted ResearchNorth WykeUK
- Bristol Veterinary SchoolUniversity of BristolLangfordUK
| |
Collapse
|
13
|
Wüthrich D, Irmler S, Berthoud H, Guggenbühl B, Eugster E, Bruggmann R. Conversion of Methionine to Cysteine in Lactobacillus paracasei Depends on the Highly Mobile cysK-ctl-cysE Gene Cluster. Front Microbiol 2018; 9:2415. [PMID: 30386310 PMCID: PMC6200037 DOI: 10.3389/fmicb.2018.02415] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 05/31/2018] [Accepted: 09/20/2018] [Indexed: 11/23/2022] Open
Abstract
Milk and dairy products are rich in nutrients and are therefore habitats for various microbiomes. However, the composition of nutrients can be quite diverse, in particular among the sulfur containing amino acids. In milk, methionine is present in a 25-fold higher abundance than cysteine. Interestingly, a fraction of strains of the species L. paracasei – a flavor-enhancing adjunct culture species – can grow in medium with methionine as the sole sulfur source. In this study, we focus on genomic and evolutionary aspects of sulfur dependence in L. paracasei strains. From 24 selected L. paracasei strains, 16 strains can grow in medium with methionine as sole sulfur source. We sequenced these strains to perform gene-trait matching. We found that one gene cluster – consisting of a cysteine synthase, a cystathionine lyase, and a serine acetyltransferase – is present in all strains that grow in medium with methionine as sole sulfur source. In contrast, strains that depend on other sulfur sources do not have this gene cluster. We expanded the study and searched for this gene cluster in other species and detected it in the genomes of many bacteria species used in the food production. The comparison to these species showed that two different versions of the gene cluster exist in L. paracasei which were likely gained in two distinct events of horizontal gene transfer. Additionally, the comparison of 62 L. paracasei genomes and the two versions of the gene cluster revealed that this gene cluster is mobile within the species.
Collapse
Affiliation(s)
- Daniel Wüthrich
- Interfaculty Bioinformatics Unit and Swiss Institute of Bioinformatics, University of Bern, Bern, Switzerland
| | | | | | | | - Elisabeth Eugster
- School of Agricultural, Forest and Food Sciences HAFL, Bern University of Applied Sciences, Zollikofen, Switzerland
| | - Rémy Bruggmann
- Interfaculty Bioinformatics Unit and Swiss Institute of Bioinformatics, University of Bern, Bern, Switzerland
| |
Collapse
|
14
|
Sancho R, Cantalapiedra CP, López-Alvarez D, Gordon SP, Vogel JP, Catalán P, Contreras-Moreira B. Comparative plastome genomics and phylogenomics of Brachypodium: flowering time signatures, introgression and recombination in recently diverged ecotypes. THE NEW PHYTOLOGIST 2018; 218:1631-1644. [PMID: 29206296 DOI: 10.1111/nph.14926] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Academic Contribution Register] [Received: 10/10/2016] [Accepted: 03/03/2017] [Indexed: 05/24/2023]
Abstract
Few pan-genomic studies have been conducted in plants, and none of them have focused on the intraspecific diversity and evolution of their plastid genomes. We address this issue in Brachypodium distachyon and its close relatives B. stacei and B. hybridum, for which a large genomic data set has been compiled. We analyze inter- and intraspecific plastid comparative genomics and phylogenomic relationships within a family-wide framework. Major indel differences were detected between Brachypodium plastomes. Within B. distachyon, we detected two main lineages, a mostly Extremely Delayed Flowering (EDF+) clade and a mostly Spanish (S+) - Turkish (T+) clade, plus nine chloroplast capture and two plastid DNA (ptDNA) introgression and micro-recombination events. Early Oligocene (30.9 million yr ago (Ma)) and Late Miocene (10.1 Ma) divergence times were inferred for the respective stem and crown nodes of Brachypodium and a very recent Mid-Pleistocene (0.9 Ma) time for the B. distachyon split. Flowering time variation is a main factor driving rapid intraspecific divergence in B. distachyon, although it is counterbalanced by repeated introgression between previously isolated lineages. Swapping of plastomes between the three different genomic groups, EDF+, T+, S+, probably resulted from random backcrossing followed by stabilization through selection pressure.
Collapse
Affiliation(s)
- Rubén Sancho
- Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Huesca, Spain
- Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Saragossa, Spain
| | - Carlos P Cantalapiedra
- Department of Genetics and Plant Breeding, Estación Experimental de Aula Dei-Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
| | - Diana López-Alvarez
- Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Huesca, Spain
| | - Sean P Gordon
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - John P Vogel
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
| | - Pilar Catalán
- Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Huesca, Spain
- Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Saragossa, Spain
| | - Bruno Contreras-Moreira
- Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Saragossa, Spain
- Department of Genetics and Plant Breeding, Estación Experimental de Aula Dei-Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
- Fundación ARAID, Zaragoza, Spain
| |
Collapse
|
15
|
Genomics of Salmonella phage ΦStp1: candidate bacteriophage for biocontrol. Virus Genes 2018; 54:311-318. [DOI: 10.1007/s11262-018-1538-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 08/11/2017] [Accepted: 02/01/2018] [Indexed: 01/21/2023]
|
16
|
Cha SW, Bonissone S, Na S, Pevzner PA, Bafna V. The Antibody Repertoire of Colorectal Cancer. Mol Cell Proteomics 2017; 16:2111-2124. [PMID: 29046389 PMCID: PMC5724175 DOI: 10.1074/mcp.ra117.000397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 10/11/2017] [Indexed: 12/31/2022] Open
Abstract
Immunotherapy is becoming increasingly important in the fight against cancers, using and manipulating the body's immune response to treat tumors. Understanding the immune repertoire-the collection of immunological proteins-of treated and untreated cells is possible at the genomic, but technically difficult at the protein level. Standard protein databases do not include the highly divergent sequences of somatic rearranged immunoglobulin genes, and may lead to miss identifications in a mass spectrometry search. We introduce a novel proteogenomic approach, AbScan, to identify these highly variable antibody peptides, by developing a customized antibody database construction method using RNA-seq reads aligned to immunoglobulin (Ig) genes.AbScan starts by filtering transcript (RNA-seq) reads that match the template for Ig genes. The retained reads are used to construct a repertoire graph using the "split" de Bruijn graph: a graph structure that improves on the standard de Bruijn graph to capture the high diversity of Ig genes in a compact manner. AbScan corrects for sequencing errors, and converts the graph to a format suitable for searching with MS/MS search tools. We used AbScan to create an antibody database from 90 RNA-seq colorectal tumor samples. Next, we used proteogenomic analysis to search MS/MS spectra of matched colorectal samples from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) against the AbScan generated database. AbScan identified 1,940 distinct antibody peptides. Correlating with previously identified Single Amino-Acid Variants (SAAVs) in the tumor samples, we identified 163 pairs (antibody peptide, SAAV) with significant cooccurrence pattern in the 90 samples. The presence of coexpressed antibody and mutated peptides was correlated with survival time of the individuals. Our results suggest that AbScan (https://github.com/csw407/AbScan.git) is an effective tool for a proteomic exploration of the immune response in cancers.
Collapse
Affiliation(s)
- Seong Won Cha
- From the ‡Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, California
| | | | - Seungjin Na
- ¶Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92037
| | - Pavel A Pevzner
- ¶Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92037
| | - Vineet Bafna
- ¶Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92037
| |
Collapse
|
17
|
GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res 2017; 27:2050-2060. [PMID: 29097403 PMCID: PMC5741059 DOI: 10.1101/gr.222109.117] [Citation(s) in RCA: 226] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 02/24/2017] [Accepted: 09/14/2017] [Indexed: 01/08/2023]
Abstract
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.
Collapse
|
18
|
Muggli MD, Bowe A, Noyes NR, Morley PS, Belk KE, Raymond R, Gagie T, Puglisi SJ, Boucher C. Succinct colored de Bruijn graphs. Bioinformatics 2017; 33:3181-3187. [PMID: 28200001 PMCID: PMC5872255 DOI: 10.1093/bioinformatics/btx067] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 09/12/2016] [Revised: 01/16/2017] [Accepted: 02/10/2017] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION In 2012, Iqbal et al. introduced the colored de Bruijn graph, a variant of the classic de Bruijn graph, which is aimed at 'detecting and genotyping simple and complex genetic variants in an individual or population'. Because they are intended to be applied to massive population level data, it is essential that the graphs be represented efficiently. Unfortunately, current succinct de Bruijn graph representations are not directly applicable to the colored de Bruijn graph, which requires additional information to be succinctly encoded as well as support for non-standard traversal operations. RESULTS Our data structure dramatically reduces the amount of memory required to store and use the colored de Bruijn graph, with some penalty to runtime, allowing it to be applied in much larger and more ambitious sequence projects than was previously possible. AVAILABILITY AND IMPLEMENTATION https://github.com/cosmo-team/cosmo/tree/VARI. CONTACT martin.muggli@colostate.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martin D Muggli
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | - Alexander Bowe
- Department of Informatics, National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
| | | | | | - Keith E Belk
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, USA
| | - Robert Raymond
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | - Travis Gagie
- School of Computer Science and Telecommunications, Diego Portales University and CEBIB, Santiago, Chile
| | - Simon J Puglisi
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Christina Boucher
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| |
Collapse
|
19
|
Bao E, Song C, Lan L. ReMILO: reference assisted misassembly detection algorithm using short and long reads. Bioinformatics 2017; 34:24-32. [DOI: 10.1093/bioinformatics/btx524] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 04/05/2017] [Accepted: 08/15/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ergude Bao
- Software Engineering Research Center, School of Software Engineering, Beijing Jiaotong University, Beijing, China
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | - Changjin Song
- Software Engineering Research Center, School of Software Engineering, Beijing Jiaotong University, Beijing, China
| | - Lingxiao Lan
- Software Engineering Research Center, School of Software Engineering, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
20
|
Kremer FS, McBride AJA, Pinto LDS. Approaches for in silico finishing of microbial genome sequences. Genet Mol Biol 2017; 40:553-576. [PMID: 28898352 PMCID: PMC5596377 DOI: 10.1590/1678-4685-gmb-2016-0230] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 09/25/2016] [Accepted: 03/13/2017] [Indexed: 12/15/2022] Open
Abstract
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Collapse
Affiliation(s)
- Frederico Schmitt Kremer
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Alan John Alexander McBride
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Luciano da Silva Pinto
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| |
Collapse
|
21
|
Genome Sequence of " Candidatus Carsonella ruddii" Strain BC, a Nutritional Endosymbiont of Bactericera cockerelli. GENOME ANNOUNCEMENTS 2017; 5:5/17/e00236-17. [PMID: 28450512 PMCID: PMC5408110 DOI: 10.1128/genomea.00236-17] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Academic Contribution Register] [Indexed: 12/15/2022]
Abstract
Here, we report the genome of “Candidatus Carsonella ruddii” strain BC, a nutritional endosymbiont of the tomato psyllid Bactericera cockerelli. The 173,802-bp genome contains 198 protein-coding genes, with a G+C content of 14.8%.
Collapse
|
22
|
Abstract
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.
Collapse
|
23
|
Ye C, Ma ZS. Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads. PeerJ 2016; 4:e2016. [PMID: 27330851 PMCID: PMC4906657 DOI: 10.7717/peerj.2016] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 09/21/2015] [Accepted: 04/15/2016] [Indexed: 11/20/2022] Open
Abstract
Motivation. The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15–40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.
Collapse
Affiliation(s)
- Chengxi Ye
- Department of Computer Science, University of Maryland , College Park, MD , USA
| | - Zhanshan Sam Ma
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences , Kunming, Yunnan , China
| |
Collapse
|
24
|
Draft Genome Sequence of a Pseudomonas aeruginosa Strain Able To Decompose
N
,
N
-Dimethyl Formamide. GENOME ANNOUNCEMENTS 2016; 4:4/1/e01609-15. [PMID: 26847883 PMCID: PMC4742680 DOI: 10.1128/genomea.01609-15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Academic Contribution Register] [Indexed: 11/20/2022]
Abstract
Pseudomonas aeruginosa is a Gram-negative bacterium, which uses a variety of organic chemicals as carbon sources. Here, we report the genome sequence of the Cu1510 isolate from wastewater containing a high concentration of N,N-dimethyl formamide.
Collapse
|
25
|
Abstract
Bacterial genome sequencing is now an affordable choice for many laboratories for applications in research, diagnostic, and clinical microbiology. Nowadays, an overabundance of tools is available for genomic data analysis. However, tools differ for algorithms, languages, hardware requirements, and user interface, and combining them as it is necessary for sequence data interpretation often requires (bio)informatics skills which can be difficult to find in many laboratories. In addition, multiple data sources, as well as exceedingly large dataset sizes, and increasingly computational complexity further challenge the accessibility, reproducibility, and transparency of the entire process. In this chapter we will cover the main bioinformatics steps required for a complete bacterial genome analysis using next-generation sequencing data, from the raw sequence data to assembled and annotated genomes. All the tools described are available in the Orione framework ( http://orione.crs4.it ), which uniquely combines in a transparent way the most used open source bioinformatics tools for microbiology, allowing microbiologist without any specific hardware or informatics skill to conduct data-intensive computational analyses from quality control to microbial gene annotation.
Collapse
Affiliation(s)
- Massimiliano Orsini
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010, Pula, CA, Italy
| | - Gianmauro Cuccuru
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010, Pula, CA, Italy
| | - Paolo Uva
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010, Pula, CA, Italy
| | - Giorgio Fotia
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010, Pula, CA, Italy.
| |
Collapse
|
26
|
Muggli MD, Puglisi SJ, Ronen R, Boucher C. Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics 2015; 31:i80-8. [PMID: 26072512 PMCID: PMC4542784 DOI: 10.1093/bioinformatics/btv262] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/22/2022] Open
Abstract
Motivation: A crucial problem in genome assembly is the discovery and correction of misassembly errors in draft genomes. We develop a method called misSEQuel that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data. Our method also fulfills the critical need for open source computational methods for analyzing optical mapping data. We apply our method to various assemblies of the loblolly pine, Francisella tularensis, rice and budgerigar genomes. We generated and used stimulated optical mapping data for loblolly pine and F.tularensis and used real optical mapping data for rice and budgerigar. Results: Our results demonstrate that we detect more than 54% of extensively misassembled contigs and more than 60% of locally misassembled contigs in assemblies of F.tularensis and between 31% and 100% of extensively misassembled contigs and between 57% and 73% of locally misassembled contigs in assemblies of loblolly pine. Using the real optical mapping data, we correctly identified 75% of extensively misassembled contigs and 100% of locally misassembled contigs in rice, and 77% of extensively misassembled contigs and 80% of locally misassembled contigs in budgerigar. Availability and implementation:misSEQuel can be used as a post-processing step in combination with any genome assembler and is freely available at http://www.cs.colostate.edu/seq/. Contact:muggli@cs.colostate.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martin D Muggli
- Department of Computer Science, Colorado State University, Fort Collins, CO 80526, USA, Department of Computer Science, University of Helsinki, Finland and Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Simon J Puglisi
- Department of Computer Science, Colorado State University, Fort Collins, CO 80526, USA, Department of Computer Science, University of Helsinki, Finland and Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Roy Ronen
- Department of Computer Science, Colorado State University, Fort Collins, CO 80526, USA, Department of Computer Science, University of Helsinki, Finland and Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Christina Boucher
- Department of Computer Science, Colorado State University, Fort Collins, CO 80526, USA, Department of Computer Science, University of Helsinki, Finland and Bioinformatics Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
27
|
Horn F, Linde J, Mattern DJ, Walther G, Guthke R, Brakhage AA, Valiante V. Draft Genome Sequence of the Fungus Penicillium brasilianum MG11. GENOME ANNOUNCEMENTS 2015; 3:e00724-15. [PMID: 26337871 PMCID: PMC4559720 DOI: 10.1128/genomea.00724-15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Academic Contribution Register] [Received: 05/26/2015] [Accepted: 07/24/2015] [Indexed: 02/02/2023]
Abstract
The genus Penicillium belongs to the phylum Ascomycota and includes a variety of fungal species important for food and drug production. We report the draft genome sequence of Penicillium brasilianum MG11. This strain was isolated from soil, and it was reported to produce different secondary metabolites.
Collapse
Affiliation(s)
- Fabian Horn
- Department of Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany
| | - Jörg Linde
- Department of Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany
| | - Derek J Mattern
- Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany
| | - Grit Walther
- National Center for Invasive Mycoses, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany
| | - Reinhard Guthke
- Department of Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany
| | - Axel A Brakhage
- Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany Friedrich Schiller University, Institute for Microbiology, Jena, Germany
| | - Vito Valiante
- Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany Leibniz Junior Research Group-Biobricks of Microbial Natural Product Syntheses, Leibniz Institute for Natural Product Research and Infection Biology, Hans Knöll Institute (HKI), Jena, Germany
| |
Collapse
|
28
|
Complete Genome Sequence of Leptospira interrogans Serovar Bratislava, Strain PigK151. GENOME ANNOUNCEMENTS 2015; 3:3/3/e00678-15. [PMID: 26112787 PMCID: PMC4481285 DOI: 10.1128/genomea.00678-15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Academic Contribution Register] [Indexed: 11/20/2022]
Abstract
Leptospira interrogans serovar Bratislava infection occurs in multiple domestic and wildlife species and is associated with poor reproductive performance in swine and horses. We present the complete genome assembly of strain PigK151 comprising two chromosomes, CI (4.457 Mbp) and CII (358 kbp).
Collapse
|
29
|
Hu S, Sablok G, Wang B, Qu D, Barbaro E, Viola R, Li M, Varotto C. Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats. BMC Genomics 2015; 16:306. [PMID: 25887666 PMCID: PMC4446112 DOI: 10.1186/s12864-015-1498-0] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 11/22/2014] [Accepted: 03/27/2015] [Indexed: 11/10/2022] Open
Abstract
Background Plastid genomes, also known as plastomes, are shaped by the selective forces acting on the fundamental cellular functions they code for and thus they are expected to preserve signatures of the adaptive path undertaken by different plant species during evolution. To identify molecular signatures of positive selection associated to adaptation to contrasting ecological niches, we sequenced with Solexa technology the plastomes of two congeneric Brassicaceae species with different habitat preference, Cardamine resedifolia and Cardamine impatiens. Results Following in-depth characterization of plastome organization, repeat patterns and gene space, the comparison of the newly sequenced plastomes between each other and with 15 fully sequenced Brassicaceae plastomes publically available in GenBank uncovered dynamic variation of the IR boundaries in the Cardamine lineage. We further detected signatures of positive selection in ten of the 75 protein-coding genes of the examined plastomes, identifying a range of chloroplast functions putatively involved in adaptive processes within the family. For instance, the three residues found to be under positive selection in RUBISCO could possibly be involved in the modulation of RUBISCO aggregation/activation and enzymatic specificty in Brassicaceae. In addition, our results points to differential evolutionary rates in Cardamine plastomes. Conclusions Overall our results support the existence of wider signatures of positive selection in the plastome of C. resedifolia, possibly as a consequence of adaptation to high altitude environments. We further provide a first characterization of the selective patterns shaping the Brassicaceae plastomes, which could help elucidate the driving forces underlying adaptation and evolution in this important plant family. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1498-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shiliang Hu
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| | - Gaurav Sablok
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| | - Bo Wang
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| | - Dong Qu
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy. .,College of Horticulture, Northwest Agricultural and Forest University, 712100, Yangling, Shaanxi, PR China.
| | - Enrico Barbaro
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| | - Roberto Viola
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| | - Mingai Li
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| | - Claudio Varotto
- Ecogenomics Laboratory, Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S, Michele all'Adige (TN), Italy.
| |
Collapse
|
30
|
Marinier E, Brown DG, McConkey BJ. Pollux: platform independent error correction of single and mixed genomes. BMC Bioinformatics 2015; 16:10. [PMID: 25592313 PMCID: PMC4307147 DOI: 10.1186/s12859-014-0435-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 08/21/2014] [Accepted: 12/17/2014] [Indexed: 12/13/2022] Open
Abstract
Background Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads. Results We have developed a general-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and correct insertion, deletion, and homopolymer errors while remaining sensitive to low coverage areas of sequencing projects. Using published data sets, we correct 94% of Illumina MiSeq errors, 88% of Ion Torrent PGM errors, 85% of Roche 454 GS Junior errors. Introduced errors are 20 to 70 times more rare than successfully corrected errors. Furthermore, we show that the quality of assemblies improves when reads are corrected by our software. Conclusions Pollux is highly effective at correcting errors across platforms, and is consistently able to perform as well or better than currently available error correction software. Pollux provides general-purpose error correction and may be used in applications with or without assembly.
Collapse
Affiliation(s)
- Eric Marinier
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave W, Waterloo, ON N2L 3G1, Canada.
| | - Daniel G Brown
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave W, Waterloo, ON N2L 3G1, Canada.
| | - Brendan J McConkey
- Department of Biology, University of Waterloo, 200 University Ave W, N2L3G1 Waterloo, Canada.
| |
Collapse
|
31
|
|
32
|
Abstract
The development of "next-generation" high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics-the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence ("de novo assembly"). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.
Collapse
Affiliation(s)
- Thomas D Otto
- Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge, CB10 1SA, UK,
| |
Collapse
|
33
|
Bratcher HB, Corton C, Jolley KA, Parkhill J, Maiden MCJ. A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes. BMC Genomics 2014; 15:1138. [PMID: 25523208 PMCID: PMC4377854 DOI: 10.1186/1471-2164-15-1138] [Citation(s) in RCA: 136] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 10/02/2014] [Accepted: 12/04/2014] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Highly parallel, 'second generation' sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary. RESULTS The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database. CONCLUSIONS The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
Collapse
|
34
|
Zehr ES, Bayles DO, Boatwright WD, Tabatabai LB, Register KB. Complete genome sequence of Ornithobacterium rhinotracheale strain ORT-UMN 88. Stand Genomic Sci 2014; 9:16. [PMID: 25780507 PMCID: PMC4334632 DOI: 10.1186/1944-3277-9-16] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 06/05/2014] [Accepted: 10/29/2014] [Indexed: 11/24/2022] Open
Abstract
Ornithobacterium rhinotracheale strain ORT-UMN 88 is a Gram-negative, pleomorphic, rod-shaped bacterium and an etiologic agent of pneumonia and airsacculitis in poultry. It is a member of the family Flavobacteriaceae of the phylum Bacteroidetes. O. rhinotracheale strain ORT-UMN 88 was isolated from the pneumonic lung of a turkey in 1995. It was the isolate first used to experimentally reproduce disease in turkeys and has since been the focus of investigations characterizing potential virulence factors of the bacterium. The genome of O. rhinotracheale strain ORT-UMN 88 consists of a circular chromosome of 2,397,867 bp with a total of 2300 protein-coding genes, nine RNA genes, and one noncoding RNA gene. A companion paper in this issue of SIGS reports the non-contiguous finished genome sequence of an additional strain of O. rhinotracheale, isolated in 2006.
Collapse
Affiliation(s)
- Emilie S Zehr
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| | - Darrell O Bayles
- Infectious Bacterial Diseases Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| | - William D Boatwright
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| | - Louisa B Tabatabai
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA ; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Karen B Register
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| |
Collapse
|
35
|
Zehr ES, Bayles DO, Boatwright WD, Tabatabai LB, Register KB. Non-contiguous finished genome sequence of Ornithobacterium rhinotracheale strain H06-030791. Stand Genomic Sci 2014; 9:14. [PMID: 25780505 PMCID: PMC4334941 DOI: 10.1186/1944-3277-9-14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 06/05/2014] [Accepted: 10/02/2014] [Indexed: 12/02/2022] Open
Abstract
The Gram-negative, pleomorphic, rod-shaped bacterium Ornithobacterium rhinotracheale is a cause of pneumonia and airsacculitis in poultry. It is a member of the family Flavobacteriaceae of the phylum “Bacteroidetes”. O. rhinotracheale strain H06-030791 was isolated from the lung of a turkey in North Carolina in 2006. Its genome consists of a circular chromosome of 2,319,034 bp in length with a total of 2243 protein-coding genes and nine RNA genes. Genome sequences are available for two additional strains of O. rhinotracheale, isolated in 1988 and 1995, the latter described in a companion genome report in this issue of SIGS. The genome sequence of O. rhinotracheale strain H06-030791, a more contemporary isolate, will be of value in establishing core and pan-genomes for O. rhinotracheale and elucidating its evolutionary history.
Collapse
Affiliation(s)
- Emilie S Zehr
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| | - Darrell O Bayles
- Infectious Bacterial Diseases Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| | - William D Boatwright
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| | - Louisa B Tabatabai
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA ; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Karen B Register
- Ruminant Diseases and Immunology Research Unit, U. S. Department of Agriculture, Agricultural Research Service, National Animal Disease Center, Ames, IA, USA
| |
Collapse
|
36
|
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014; 9:e112963. [PMID: 25409509 PMCID: PMC4237348 DOI: 10.1371/journal.pone.0112963] [Citation(s) in RCA: 5803] [Impact Index Per Article: 527.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 08/25/2014] [Accepted: 10/16/2014] [Indexed: 02/06/2023] Open
Abstract
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
Collapse
Affiliation(s)
- Bruce J. Walker
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (BJW); (AME)
| | - Thomas Abeel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- VIB Department of Plant Systems Biology, Ghent University, Ghent, Belgium
| | - Terrance Shea
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Margaret Priest
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Amr Abouelliel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sharadha Sakthikumar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Christina A. Cuomo
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Qiandong Zeng
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jennifer Wortman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sarah K. Young
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Ashlee M. Earl
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (BJW); (AME)
| |
Collapse
|
37
|
Haugum K, Johansen J, Gabrielsen C, Brandal LT, Bergh K, Ussery DW, Drabløs F, Afset JE. Comparative genomics to delineate pathogenic potential in non-O157 Shiga toxin-producing Escherichia coli (STEC) from patients with and without haemolytic uremic syndrome (HUS) in Norway. PLoS One 2014; 9:e111788. [PMID: 25360710 PMCID: PMC4216125 DOI: 10.1371/journal.pone.0111788] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 07/17/2014] [Accepted: 09/30/2014] [Indexed: 11/19/2022] Open
Abstract
Shiga toxin-producing Escherichia coli (STEC) cause infections in humans ranging from asymptomatic carriage to bloody diarrhoea and haemolytic uremic syndrome (HUS). Here we present whole genome comparison of Norwegian non-O157 STEC strains with the aim to distinguish between strains with the potential to cause HUS and less virulent strains. Whole genome sequencing and comparisons were performed across 95 non-O157 STEC strains. Twenty-three of these were classified as HUS-associated, including strains from patients with HUS (n = 19) and persons with an epidemiological link to a HUS-case (n = 4). Genomic comparison revealed considerable heterogeneity in gene content across the 95 STEC strains. A clear difference in gene profile was observed between strains with and without the Locus of Enterocyte Effacement (LEE) pathogenicity island. Phylogenetic analysis of the core genome showed high degree of diversity among the STEC strains, but all HUS-associated STEC strains were distributed in two distinct clusters within phylogroup B1. However, non-HUS strains were also found in these clusters. A number of accessory genes were found to be significantly overrepresented among HUS-associated STEC, but none of them were unique to this group of strains, suggesting that different sets of genes may contribute to the pathogenic potential in different phylogenetic STEC lineages. In this study we were not able to clearly distinguish between HUS-associated and non-HUS non-O157 STEC by extensive genome comparisons. Our results indicate that STECs from different phylogenetic backgrounds have independently acquired virulence genes that determine pathogenic potential, and that the content of such genes is overlapping between HUS-associated and non-HUS strains.
Collapse
Affiliation(s)
- Kjersti Haugum
- Department of Laboratory Medicine, Children’s and Women’s Health, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- * E-mail:
| | - Jostein Johansen
- Department of Cancer Research and Molecular Medicine, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Christina Gabrielsen
- Department of Laboratory Medicine, Children’s and Women’s Health, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Lin T. Brandal
- Department of Foodborne Infections, Norwegian Institute of Public Health, Oslo, Norway
| | - Kåre Bergh
- Department of Laboratory Medicine, Children’s and Women’s Health, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Medical Microbiology, St. Olavs University Hospital, Trondheim, Norway
| | - David W. Ussery
- Biosciences Division, Oak Ridge National Labs, Oak Ridge, Tennessee, United States of America
| | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Jan Egil Afset
- Department of Laboratory Medicine, Children’s and Women’s Health, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Medical Microbiology, St. Olavs University Hospital, Trondheim, Norway
| |
Collapse
|
38
|
Bao E, Jiang T, Girke T. AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references. Bioinformatics 2014; 30:i319-i328. [PMID: 24932000 PMCID: PMC4058956 DOI: 10.1093/bioinformatics/btu291] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 01/08/2023] Open
Abstract
Motivation:De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Results: Here we introduce AlignGraph, an algorithm for extending and joining de novo-assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and preassembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the PE multipositional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7–62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9–94.5% and 80.3–165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph’s efficiency in improving genome assemblies by taking advantage of closely related references. Availability and implementation: The AlignGraph software can be downloaded for free from this site: https://github.com/baoe/AlignGraph. Contact:thomas.girke@ucr.edu
Collapse
Affiliation(s)
- Ergude Bao
- Department of Computer Science and Engineering and Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Tao Jiang
- Department of Computer Science and Engineering and Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Thomas Girke
- Department of Computer Science and Engineering and Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| |
Collapse
|
39
|
Linde J, Schwartze V, Binder U, Lass-Flörl C, Voigt K, Horn F. De Novo Whole-Genome Sequence and Genome Annotation of Lichtheimia ramosa. GENOME ANNOUNCEMENTS 2014; 2:e00888-14. [PMID: 25212617 PMCID: PMC4161746 DOI: 10.1128/genomea.00888-14] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Academic Contribution Register] [Received: 08/07/2014] [Accepted: 08/11/2014] [Indexed: 11/20/2022]
Abstract
We report the annotated draft genome sequence of Lichtheimia ramosa (JMRC FSU:6197). It has been reported to be a causative organism of mucormycosis, a rare but rapidly progressive infection in immunocompromised humans. The functionally annotated genomic sequence consists of 74 scaffolds with a total number of 11,510 genes.
Collapse
Affiliation(s)
- Jörg Linde
- Systems Biology/Bioinformatics, Hans-Knöll-Institut, Jena, Germany
| | - Volker Schwartze
- Jena Microbial Resource Collection, Hans-Knöll-Institut, Jena, Germany
| | - Ulrike Binder
- Division of Hygiene and Medical Microbiology, Innsbruck Medical University, Innsbruck, Austria
| | - Cornelia Lass-Flörl
- Division of Hygiene and Medical Microbiology, Innsbruck Medical University, Innsbruck, Austria
| | - Kerstin Voigt
- Jena Microbial Resource Collection, Hans-Knöll-Institut, Jena, Germany
| | - Fabian Horn
- Systems Biology/Bioinformatics, Hans-Knöll-Institut, Jena, Germany
| |
Collapse
|
40
|
Cuccuru G, Orsini M, Pinna A, Sbardellati A, Soranzo N, Travaglione A, Uva P, Zanetti G, Fotia G. Orione, a web-based framework for NGS analysis in microbiology. ACTA ACUST UNITED AC 2014; 30:1928-9. [PMID: 24618473 PMCID: PMC4071203 DOI: 10.1093/bioinformatics/btu135] [Citation(s) in RCA: 112] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/13/2022]
Abstract
Summary: End-to-end next-generation sequencing microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult owing to a lack of interoperability, reproducibility and transparency. To overcome these limitations we present Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for next-generation sequencing microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics. Availability and implementation: Orione is available online at http://orione.crs4.it. Contact:gianmauro.cuccuru@crs4.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gianmauro Cuccuru
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | - Massimiliano Orsini
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | - Andrea Pinna
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | - Andrea Sbardellati
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | - Nicola Soranzo
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | | | - Paolo Uva
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | - Gianluigi Zanetti
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| | - Giorgio Fotia
- CRS4, Science and Technology Park Polaris, Piscina Manna, 09010 Pula (CA), Italy
| |
Collapse
|
41
|
|
42
|
Liu T, Tsai CH, Lee WB, Chiang JH. Optimizing information in Next-Generation-Sequencing (NGS) reads for improving de novo genome assembly. PLoS One 2013; 8:e69503. [PMID: 23922726 PMCID: PMC3726674 DOI: 10.1371/journal.pone.0069503] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 02/07/2013] [Accepted: 06/11/2013] [Indexed: 12/01/2022] Open
Abstract
Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/.
Collapse
Affiliation(s)
- Tsunglin Liu
- Institute of Bioinformatics and Biosignal Transduction, National Cheng Kung University, Tainan, Taiwan
| | - Cheng-Hung Tsai
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Wen-Bin Lee
- Institute of Bioinformatics and Biosignal Transduction, National Cheng Kung University, Tainan, Taiwan
| | - Jung-Hsien Chiang
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
- * E-mail:
| |
Collapse
|
43
|
Abstract
Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
Collapse
Affiliation(s)
- Niranjan Nagarajan
- Computational and Systems Biology, Genome Institute of Singapore, 138672 Singapore
| | | |
Collapse
|
44
|
Abstract
The recovery and assembly of genome sequences from samples containing communities of organisms pose several challenges. Because it is rarely possible to disassociate the resident organisms prior to sequencing, a major obstacle is the assignment of sequences to a single genome that can be fully assembled. This chapter delineates many of the decisions, methodologies, and approaches that can lead to the generation of complete or nearly complete microbial genome sequences from heterogeneous samples-that is, the procedures that allow us to turn metagenomes into genomes.
Collapse
Affiliation(s)
- Daniel B Sloan
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, USA
| | | | | | | | | |
Collapse
|