1
|
Willink B, Tunström K, Nilén S, Chikhi R, Lemane T, Takahashi M, Takahashi Y, Svensson EI, Wheat CW. The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies. Nat Ecol Evol 2024; 8:83-97. [PMID: 37932383 PMCID: PMC10781644 DOI: 10.1038/s41559-023-02243-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/04/2023] [Indexed: 11/08/2023]
Abstract
Sex-limited morphs can provide profound insights into the evolution and genomic architecture of complex phenotypes. Inter-sexual mimicry is one particular type of sex-limited polymorphism in which a novel morph resembles the opposite sex. While inter-sexual mimics are known in both sexes and a diverse range of animals, their evolutionary origin is poorly understood. Here, we investigated the genomic basis of female-limited morphs and male mimicry in the common bluetail damselfly. Differential gene expression between morphs has been documented in damselflies, but no causal locus has been previously identified. We found that male mimicry originated in an ancestrally sexually dimorphic lineage in association with multiple structural changes, probably driven by transposable element activity. These changes resulted in ~900 kb of novel genomic content that is partly shared by male mimics in a close relative, indicating that male mimicry is a trans-species polymorphism. More recently, a third morph originated following the translocation of part of the male-mimicry sequence into a genomic position ~3.5 mb apart. We provide evidence of balancing selection maintaining male mimicry, in line with previous field population studies. Our results underscore how structural variants affecting a handful of potentially regulatory genes and morph-specific genes can give rise to novel and complex phenotypic polymorphisms.
Collapse
Affiliation(s)
- Beatriz Willink
- Department of Zoology, Stockholm University, Stockholm, Sweden.
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore.
| | - Kalle Tunström
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Sofie Nilén
- Department of Biology, Lund University, Lund, Sweden
| | - Rayan Chikhi
- Sequence Bioinformatics, Institut Pasteur, Université Paris Cité, Paris, France
| | - Téo Lemane
- University of Rennes, Inria, CNRS, IRISA, Rennes, France
| | - Michihiko Takahashi
- Graduate School of Life Sciences, Tohoku University, Sendai, Japan
- Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | - Yuma Takahashi
- Graduate School of Science, Chiba University, Chiba, Japan
| | | | | |
Collapse
|
2
|
Sato K, Ikagawa Y, Niwa R, Nishioka H, Horie M, Iwahashi H. Genome Sequencing Unveils Nomadic Traits of Lactiplantibacillus plantarum in Japanese Post-Fermented Tea. Curr Microbiol 2023; 81:52. [PMID: 38155273 DOI: 10.1007/s00284-023-03566-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/17/2023] [Indexed: 12/30/2023]
Abstract
Post-fermented tea production involving microbial fermentation is limited to a few regions, such as Southeast Asia and Japan, with Japan's Shikoku island being particularly prominent. Lactiplantibacillus plantarum was the dominant species found in tea leaves after anaerobic fermentation of Awa-bancha in Miyoshi City, Tokushima, and Ishizuchi-kurocha in Ehime. Although the draft genome of L. plantarum from Japanese post-fermented tea has been previously reported, its genetic diversity requires further exploration. In this study, whole-genome sequencing was conducted on four L. plantarum strains isolated from Japanese post-fermented tea using nanopore sequencing. These isolates were then compared with other sources to examine their genetic diversity revealing that L. plantarum isolated from Japanese post-fermented tea contained several highly variable gene regions associated with sugar metabolism and transportation. However, no source-specific genes or clusters were identified within accessory or core gene regions. This study indicates that L. plantarum possesses high genetic diversity and that the unique environment of Japanese post-fermented tea does not appear to exert selective pressure on L. plantarum growth.
Collapse
Affiliation(s)
- Kyoka Sato
- Department of Life Science and Chemistry, Graduate School of Natural Science and Technology, Gifu University, Gifu, 501-1193, Japan.
| | - Yuichiro Ikagawa
- Department of Life Science and Chemistry, Graduate School of Natural Science and Technology, Gifu University, Gifu, 501-1193, Japan
| | - Ryo Niwa
- Graduate School of Medicine, Kyoto University, Kyoto, 606-8501, Japan
| | - Hiroki Nishioka
- Food and Biotechnology Division, Tokushima Prefectural Industrial Technology Center, Tokushima, 770-8021, Japan
| | - Masanori Horie
- Health and Medical Research Institute (HMRI), National Institute of Advanced Industrial Science and Technology (AIST), Kagawa, 761-0395, Japan
| | - Hitoshi Iwahashi
- Department of Life Science and Chemistry, Graduate School of Natural Science and Technology, Gifu University, Gifu, 501-1193, Japan.
| |
Collapse
|
3
|
Li K, Xu P, Wang J, Yi X, Jiao Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun 2023; 14:6556. [PMID: 37848433 PMCID: PMC10582259 DOI: 10.1038/s41467-023-42336-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 10/05/2023] [Indexed: 10/19/2023] Open
Abstract
Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.
Collapse
Affiliation(s)
- Kunpeng Li
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Peng Xu
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinpeng Wang
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xin Yi
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- China National Botanical Garden, Beijing, China
| | - Yuannian Jiao
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- China National Botanical Garden, Beijing, China.
| |
Collapse
|
4
|
Zhang Y, Lu HW, Ruan J. GAEP: a comprehensive genome assembly evaluating pipeline. J Genet Genomics 2023; 50:747-754. [PMID: 37245652 DOI: 10.1016/j.jgg.2023.05.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 05/19/2023] [Accepted: 05/23/2023] [Indexed: 05/30/2023]
Abstract
With the rapid development of sequencing technologies, especially the maturity of third-generation sequencing technologies, there has been a significant increase in the number and quality of published genome assemblies. The emergence of these high-quality genomes has raised higher requirements for genome evaluation. Although numerous computational methods have been developed to evaluate assembly quality from various perspectives, the selective use of these evaluation methods can be arbitrary and inconvenient for fairly comparing the assembly quality. To address this issue, we have developed the Genome Assembly Evaluating Pipeline (GAEP), which provides a comprehensive assessment pipeline for evaluating genome quality from multiple perspectives, including continuity, completeness, and correctness. Additionally, GAEP includes new functions for detecting misassemblies and evaluating the assembly redundancy, which performs well in our testing. GAEP is publicly available at https://github.com/zy-optimistic/GAEP under the GPL3.0 License. With GAEP, users can quickly obtain accurate and reliable evaluation results, facilitating the comparison and selection of high-quality genome assemblies.
Collapse
Affiliation(s)
- Yong Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Hong-Wei Lu
- State Key Laboratory of Rice Biology and Breeding, China National Rice Research Institute, Chinese Academy of Agricultural Sciences, Hangzhou, Zhejiang 311401, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China.
| |
Collapse
|
5
|
Chakraborty M, Lara AG, Dang A, McCulloch KJ, Rainbow D, Carter D, Ngo LT, Solares E, Said I, Corbett-Detig RB, Gilbert LE, Emerson JJ, Briscoe AD. Sex-linked gene traffic underlies the acquisition of sexually dimorphic UV color vision in Heliconius butterflies. Proc Natl Acad Sci U S A 2023; 120:e2301411120. [PMID: 37552755 PMCID: PMC10438391 DOI: 10.1073/pnas.2301411120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 06/16/2023] [Indexed: 08/10/2023] Open
Abstract
The acquisition of novel sexually dimorphic traits poses an evolutionary puzzle: How do new traits arise and become sex-limited? Recently acquired color vision, sexually dimorphic in animals like primates and butterflies, presents a compelling model for understanding how traits become sex-biased. For example, some Heliconius butterflies uniquely possess UV (ultraviolet) color vision, which correlates with the expression of two differentially tuned UV-sensitive rhodopsins, UVRh1 and UVRh2. To discover how such traits become sexually dimorphic, we studied Heliconius charithonia, which exhibits female-specific UVRh1 expression. We demonstrate that females, but not males, discriminate different UV wavelengths. Through whole-genome shotgun sequencing and assembly of the H. charithonia genome, we discovered that UVRh1 is present on the W chromosome, making it obligately female-specific. By knocking out UVRh1, we show that UVRh1 protein expression is absent in mutant female eye tissue, as in wild-type male eyes. A PCR survey of UVRh1 sex-linkage across the genus shows that species with female-specific UVRh1 expression lack UVRh1 gDNA in males. Thus, acquisition of sex linkage is sufficient to achieve female-specific expression of UVRh1, though this does not preclude other mechanisms, like cis-regulatory evolution from also contributing. Moreover, both this event, and mutations leading to differential UV opsin sensitivity, occurred early in the history of Heliconius. These results suggest a path for acquiring sexual dimorphism distinct from existing mechanistic models. We propose a model where gene traffic to heterosomes (the W or the Y) genetically partitions a trait by sex before a phenotype shifts (spectral tuning of UV sensitivity).
Collapse
Affiliation(s)
- Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
- Department of Biology, Texas A&M University, College Station, TX77843
| | | | - Andrew Dang
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
| | - Kyle J. McCulloch
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
- Department of Ecology, Evolution and Behavior, University of Minnesota, St. Paul, MN55108
| | - Dylan Rainbow
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
| | - David Carter
- Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA92521
| | - Luna Thanh Ngo
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
| | - Edwin Solares
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
| | - Iskander Said
- Department of Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz, CA95064
| | - Russell B. Corbett-Detig
- Department of Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz, CA95064
| | | | - J. J. Emerson
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
| | - Adriana D. Briscoe
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA92697
| |
Collapse
|
6
|
Larivière D, Abueg L, Brajuka N, Gallardo-Alba C, Grüning B, Ko BJ, Ostrovsky A, Palmada-Flores M, Pickett BD, Rabbani K, Balacco JR, Chaisson M, Cheng H, Collins J, Denisova A, Fedrigo O, Gallo GR, Giani AM, Gooder GM, Jain N, Johnson C, Kim H, Lee C, Marques-Bonet T, O'Toole B, Rhie A, Secomandi S, Sozzoni M, Tilley T, Uliano-Silva M, van den Beek M, Waterhouse RM, Phillippy AM, Jarvis ED, Schatz MC, Nekrutenko A, Formenti G. Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.28.546576. [PMID: 37425881 PMCID: PMC10327048 DOI: 10.1101/2023.06.28.546576] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).
Collapse
Affiliation(s)
- Delphine Larivière
- Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA
| | - Linelle Abueg
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | | | - Cristóbal Gallardo-Alba
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Bjorn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Alex Ostrovsky
- Departments of Biology and Computer Science, Johns Hopkins University, USA
| | - Marc Palmada-Flores
- Department of Medicine and Life Sciences (MELIS), Institut de Biologia Evolutiva, Universitat Pompeu Fabra-CSIC, Barcelona 08003, Spain
| | - Brandon D Pickett
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California
| | | | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Joanna Collins
- Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom
| | - Alexandra Denisova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russia
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | | | | | | | - Nivesh Jain
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Cassidy Johnson
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- eGnome, Inc, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, NY, 10065, USA
| | - Tomas Marques-Bonet
- Department of Medicine and Life Sciences (MELIS), Institut de Biologia Evolutiva, Universitat Pompeu Fabra-CSIC, Barcelona 08003, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona 08010, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès 08193, Spain
| | - Brian O'Toole
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Simona Secomandi
- Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Marcella Sozzoni
- University of Florence, Department of Biology, Via Madonna del Piano 6, Sesto Fiorentino (FI)
| | - Tatiana Tilley
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | | | - Marius van den Beek
- Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA
| | - Robert M Waterhouse
- Department of Ecology & Evolution and Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Michael C Schatz
- Departments of Biology and Computer Science, Johns Hopkins University, USA
| | - Anton Nekrutenko
- Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| |
Collapse
|
7
|
Jeon MS, Jeong DM, Doh H, Kang HA, Jung H, Eyun SI. A practical comparison of the next-generation sequencing platform and assemblers using yeast genome. Life Sci Alliance 2023; 6:e202201744. [PMID: 36746534 PMCID: PMC9902641 DOI: 10.26508/lsa.202201744] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 01/25/2023] [Accepted: 01/25/2023] [Indexed: 02/08/2023] Open
Abstract
Assembling fragmented whole-genomic information from the sequencing data is an inevitable process for further genome-wide research. However, it is intricate to select the appropriate assembly pipeline for unknown species because of the species-specific genomic properties. Therefore, our study focused on relatively more static proclivities of sequencing platforms and assembly algorithms than the fickle genome sequences. A total of 212 draft and polished de novo assemblies were constructed under the different sequencing platforms and assembly algorithms with the repetitive yeast genome. Our comprehensive data indicated that sequencing reads from Oxford Nanopore with R7.3 flow cells generated more continuous assemblies than those derived from the PacBio Sequel, although the homopolymer-based assembly errors and chimeric contigs exist. In addition, the comparison between two second-generation sequencing platforms showed that Illumina NovaSeq 6000 provides more accurate and continuous assembly in the second-generation-sequencing-first pipeline, but MGI DNBSEQ-T7 provides a cheap and accurate read in the polishing process. Furthermore, our insight into the relationship among the computational time, read length, and coverage depth provided clues to the optimal pipelines of yeast assembly.
Collapse
Affiliation(s)
- Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Da Min Jeong
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Huijeong Doh
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Hyun Ah Kang
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Hyungtaek Jung
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Australia
| | - Seong-Il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
8
|
Ventimiglia M, Castellacci M, Usai G, Vangelisti A, Simoni S, Natali L, Cavallini A, Mascagni F, Giordani T. Discovering the Repeatome of Five Species Belonging to the Asteraceae Family: A Computational Study. PLANTS (BASEL, SWITZERLAND) 2023; 12:1405. [PMID: 36987093 PMCID: PMC10058865 DOI: 10.3390/plants12061405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/08/2023] [Accepted: 03/20/2023] [Indexed: 06/19/2023]
Abstract
Genome divergence by repeat proliferation and/or loss is a process that plays a crucial role in species evolution. Nevertheless, knowledge of the variability related to repeat proliferation among species of the same family is still limited. Considering the importance of the Asteraceae family, here we present a first contribution towards the metarepeatome of five Asteraceae species. A comprehensive picture of the repetitive components of all genomes was obtained by genome skimming with Illumina sequence reads and by analyzing a pool of full-length long terminal repeat retrotransposons (LTR-REs). Genome skimming allowed us to estimate the abundance and variability of repetitive components. The structure of the metagenome of the selected species was composed of 67% repetitive sequences, of which LTR-REs represented the bulk of annotated clusters. The species essentially shared ribosomal DNA sequences, whereas the other classes of repetitive DNA were highly variable among species. The pool of full-length LTR-REs was retrieved from all the species and their age of insertion was established, showing several lineage-specific proliferation peaks over the last 15-million years. Overall, a large variability of repeat abundance at superfamily, lineage, and sublineage levels was observed, indicating that repeats within individual genomes followed different evolutionary and temporal dynamics, and that different events of amplification or loss of these sequences may have occurred after species differentiation.
Collapse
|
9
|
Loy JD, Clawson ML, Adkins PRF, Middleton JR. Current and Emerging Diagnostic Approaches to Bacterial Diseases of Ruminants. Vet Clin North Am Food Anim Pract 2023; 39:93-114. [PMID: 36732002 DOI: 10.1016/j.cvfa.2022.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The diagnostic approaches and methods to detect bacterial pathogens in ruminants are discussed, with a focus on cattle. Conventional diagnostic methods using culture, isolation, and characterization are being replaced or supplemented with new methods. These include molecular diagnostics such as real-time polymerase chain reaction and whole-genome sequencing. In addition, methods such as matrix-assisted laser desorption ionization-time-of-flight mass spectrometry enable rapid identification and enhanced pathogen characterization. These emerging diagnostic tools can greatly enhance the ability to detect and characterize pathogens, but performance and interpretation vary greatly across sample and pathogen types, disease syndromes, assay performance, and other factors.
Collapse
Affiliation(s)
- John Dustin Loy
- Nebraska Veterinary Diagnostic Center, School of Veterinary Medicine and Biomedical Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA.
| | - Michael L Clawson
- USDA, Agriculture Research Service US Meat Animal Research Center, Clay Center, NE, USA
| | - Pamela R F Adkins
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, MO, USA
| | - John R Middleton
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, MO, USA
| |
Collapse
|
10
|
Lai S, Pan S, Sun C, Coelho LP, Chen WH, Zhao XM. metaMIC: reference-free misassembly identification and correction of de novo metagenomic assemblies. Genome Biol 2022; 23:242. [PMID: 36376928 PMCID: PMC9661791 DOI: 10.1186/s13059-022-02810-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
Evaluating the quality of metagenomic assemblies is important for constructing reliable metagenome-assembled genomes and downstream analyses. Here, we present metaMIC ( https://github.com/ZhaoXM-Lab/metaMIC ), a machine learning-based tool for identifying and correcting misassemblies in metagenomic assemblies. Benchmarking results on both simulated and real datasets demonstrate that metaMIC outperforms existing tools when identifying misassembled contigs. Furthermore, metaMIC is able to localize the misassembly breakpoints, and the correction of misassemblies by splitting at misassembly breakpoints can improve downstream scaffolding and binning results.
Collapse
Affiliation(s)
- Senying Lai
- grid.8547.e0000 0001 0125 2443Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Shaojun Pan
- grid.8547.e0000 0001 0125 2443Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Chuqing Sun
- grid.33199.310000 0004 0368 7223Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei China
| | - Luis Pedro Coelho
- grid.8547.e0000 0001 0125 2443Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Wei-Hua Chen
- grid.33199.310000 0004 0368 7223Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei China ,grid.462338.80000 0004 0605 6769College of Life Science, Henan Normal University, Xinxiang, Henan China
| | - Xing-Ming Zhao
- grid.8547.e0000 0001 0125 2443Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China ,grid.8547.e0000 0001 0125 2443Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China ,International Human Phenome Institutes (Shanghai), Shanghai, China ,Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
11
|
Ko BJ, Lee C, Kim J, Rhie A, Yoo DA, Howe K, Wood J, Cho S, Brown S, Formenti G, Jarvis ED, Kim H. Widespread false gene gains caused by duplication errors in genome assemblies. Genome Biol 2022; 23:205. [PMID: 36167596 PMCID: PMC9516828 DOI: 10.1186/s13059-022-02764-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/02/2022] [Indexed: 12/22/2022] Open
Abstract
Background False duplications in genome assemblies lead to false biological conclusions. We quantified false duplications in popularly used previous genome assemblies for platypus, zebra finch, and Anna’s Hummingbird, and their new counterparts of the same species generated by the Vertebrate Genomes Project, of which the Vertebrate Genomes Project pipeline attempted to eliminate false duplications through haplotype phasing and purging. These assemblies are among the first generated by the Vertebrate Genomes Project where there was a prior chromosomal level reference assembly to compare with. Results Whole genome alignments revealed that 4 to 16% of the sequences are falsely duplicated in the previous assemblies, impacting hundreds to thousands of genes. These lead to overestimated gene family expansions. The main source of the false duplications is heterotype duplications, where the haplotype sequences were relatively more divergent than other parts of the genome leading the assembly algorithms to classify them as separate genes or genomic regions. A minor source is sequencing errors. Ancient ATP nucleotide binding gene families have a higher prevalence of false duplications compared to other gene families. Although present in a smaller proportion, we observe false duplications remaining in the Vertebrate Genomes Project assemblies that can be identified and purged. Conclusions This study highlights the need for more advanced assembly methods that better separate haplotypes and sequence errors, and the need for cautious analyses on gene gains. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02764-1.
Collapse
Affiliation(s)
- Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, USA
| | - Dong Ahn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | | | | | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Samara Brown
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Giulio Formenti
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Erich D Jarvis
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,eGnome, Inc, Seoul, Republic of Korea.
| |
Collapse
|
12
|
Gopalan SS, Perry BW, Schield DR, Smith CF, Mackessy SP, Castoe TA. Origins, genomic structure and copy number variation of snake venom myotoxins. Toxicon 2022; 216:92-106. [PMID: 35820472 DOI: 10.1016/j.toxicon.2022.06.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/21/2022] [Accepted: 06/27/2022] [Indexed: 10/17/2022]
Abstract
Crotamine, myotoxin a and homologs are short peptides that often comprise major fractions of rattlesnake venoms and have been extensively studied for their bioactive properties. These toxins are thought to be important for rapidly immobilizing mammalian prey and are implicated in serious, and sometimes fatal, responses to envenomation in humans. While high quality reference genomes for multiple venomous snakes are available, the loci that encode myotoxins have not been successfully assembled in any existing genome assembly. Here, we integrate new and existing genomic and transcriptomic data from the Prairie Rattlesnake (Crotalus viridis viridis) to reconstruct, characterize, and infer the chromosomal locations of myotoxin-encoding loci. We integrate long-read transcriptomics (Pacific Bioscience's Iso-Seq) and short-read RNA-seq to infer gene sequence diversity and characterize patterns of myotoxin and paralogous β-defensin expression across multiple tissues. We also identify two long non-coding RNA sequences which both encode functional myotoxins, demonstrating a newly discovered source of venom coding sequence diversity. We also integrate long-range mate-pair chromatin contact data and linked-read sequencing to infer the structure and chromosomal locations of the three myotoxin-like loci. Further, we conclude that the venom-associated myotoxin is located on chromosome 1 and is adjacent to non-venom paralogs. Consistent with this locus contributing to venom composition, we find evidence that the promoter of this gene is selectively open in venom gland tissue and contains transcription factor binding sites implicated in broad trans-regulatory pathways that regulate snake venoms. This study provides the best genomic reconstruction of myotoxin loci to date and raises questions about the physiological roles and interplay between myotoxin and related genes, as well as the genomic origins of snake venom variation.
Collapse
Affiliation(s)
- Siddharth S Gopalan
- Department of Biology, 501 S. Nedderman Dr., The University of Texas Arlington, Arlington, TX, 76019, USA
| | - Blair W Perry
- Department of Biology, 501 S. Nedderman Dr., The University of Texas Arlington, Arlington, TX, 76019, USA; School of Biological Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Drew R Schield
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, 80309, USA
| | - Cara F Smith
- School of Biological Sciences, 501 20th Street, University of Northern Colorado, Greeley, CO, 80639, USA; Department of Biochemistry and Molecular Biology, 12801 East 17th Avenue, University of Colorado Denver, Aurora, CO, 80045, USA
| | - Stephen P Mackessy
- School of Biological Sciences, 501 20th Street, University of Northern Colorado, Greeley, CO, 80639, USA
| | - Todd A Castoe
- Department of Biology, 501 S. Nedderman Dr., The University of Texas Arlington, Arlington, TX, 76019, USA.
| |
Collapse
|
13
|
Formenti G, Rhie A, Walenz BP, Thibaud-Nissen F, Shafin K, Koren S, Myers EW, Jarvis ED, Phillippy AM. Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat Methods 2022; 19:696-704. [PMID: 35361932 PMCID: PMC9745813 DOI: 10.1038/s41592-022-01445-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/07/2022] [Indexed: 12/15/2022]
Abstract
Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.
Collapse
Affiliation(s)
- Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
14
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 1094] [Impact Index Per Article: 547.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
15
|
Mascagni F, Barghini E, Ceccarelli M, Baldoni L, Trapero C, Díez CM, Natali L, Cavallini A, Giordani T. The Singular Evolution of Olea Genome Structure. FRONTIERS IN PLANT SCIENCE 2022; 13:869048. [PMID: 35432417 PMCID: PMC9009077 DOI: 10.3389/fpls.2022.869048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 03/07/2022] [Indexed: 06/14/2023]
Abstract
The current view of plant genome evolution proposes that genome size has mainly been determined by polyploidisation and amplification/loss of transposons, with a minor role played by other repeated sequences, such as tandem repeats. In cultivated olive (Olea europaea subsp. europaea var. europaea), available data suggest a singular model of genome evolution, in which a massive expansion of tandem-repeated sequences accompanied changes in nuclear architecture. This peculiar scenario highlights the importance of focusing on Olea genus evolution, to shed light on mechanisms that led to its present genomic structure. Next-generation sequencing technologies, bioinformatics and in situ hybridisation were applied to study the genomic structure of five related Olea taxa, which originated at different times from their last common ancestor. On average, repetitive DNA in the Olea taxa ranged from ~59% to ~73% of the total genome, showing remarkable differences in terms of composition. Among repeats, we identified 11 major families of tandem repeats, with different abundances in the analysed taxa, five of which were novel discoveries. Interestingly, overall tandem repeat abundance was inversely correlated to that of retrotransposons. This trend might imply a competition in the proliferation of these repeat classes. Indeed, O. paniculata, the species closest to the Olea common ancestor, showed very few tandem-repeated sequences, while it was rich in long terminal repeat retrotransposons, suggesting that the amplification of tandem repeats occurred after its divergence from the Olea ancestor. Furthermore, some tandem repeats were physically localised in closely related O. europaea subspecies (i.e., cultivated olive and O. europaea subsp. cuspidata), which showed a significant difference in tandem repeats abundance. For 4 tandem repeats families, a similar number of hybridisation signals were observed in both subspecies, apparently indicating that, after their dissemination throughout the olive genome, these tandem repeats families differentially amplified maintaining the same positions in each genome. Overall, our research identified the temporal dynamics shaping genome structure during Olea speciation, which represented a singular model of genome evolution in higher plants.
Collapse
Affiliation(s)
- Flavia Mascagni
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| | - Elena Barghini
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| | - Marilena Ceccarelli
- Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
| | - Luciana Baldoni
- CNR, Institute of Biosciences and BioResources, Perugia, Italy
| | - Carlos Trapero
- CSIRO Agriculture & Food, Narrabri, NSW, Australia
- Agronomy Department, University of Cordoba, Cordoba, Spain
| | | | - Lucia Natali
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| | - Andrea Cavallini
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| | - Tommaso Giordani
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| |
Collapse
|
16
|
Petrillo M, Fabbri M, Kagkli DM, Querci M, Van den Eede G, Alm E, Aytan-Aktug D, Capella-Gutierrez S, Carrillo C, Cestaro A, Chan KG, Coque T, Endrullat C, Gut I, Hammer P, Kay GL, Madec JY, Mather AE, McHardy AC, Naas T, Paracchini V, Peter S, Pightling A, Raffael B, Rossen J, Ruppé E, Schlaberg R, Vanneste K, Weber LM, Westh H, Angers-Loustau A. A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing. F1000Res 2022; 10:80. [PMID: 35847383 PMCID: PMC9243550 DOI: 10.12688/f1000research.39214.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/10/2022] [Indexed: 11/20/2022] Open
Abstract
Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain “live” (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines’ implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community.
Collapse
Affiliation(s)
| | - Marco Fabbri
- European Commission Joint Research Centre, Ispra, Italy
| | | | | | - Guy Van den Eede
- European Commission Joint Research Centre, Ispra, Italy
- European Commission Joint Research Centre, Geel, Belgium
| | - Erik Alm
- The European Centre for Disease Prevention and Control, Stockholm, Sweden
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | | | - Catherine Carrillo
- Ottawa Laboratory – Carling, Canadian Food Inspection Agency, Ottawa, Ontario, Canada
| | | | - Kok-Gan Chan
- International Genome Centre, Jiangsu University, Zhenjiang, China
- Division of Genetics and Molecular Biology, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Teresa Coque
- Servicio de Microbiología, Hospital Universitario Ramón y Cajal, Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
- Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Carlos III Health Institute, Madrid, Spain
| | | | - Ivo Gut
- Centro Nacional de Análisis Genómico, Centre for Genomic Regulation (CNAG-CRG), Barcelona Institute of Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Paul Hammer
- BIOMES. NGS GmbH c/o Technische Hochschule Wildau, Wildau, Germany
| | - Gemma L. Kay
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Jean-Yves Madec
- Unité Antibiorésistance et Virulence Bactériennes, ANSES Site de Lyon, Lyon, France
| | - Alison E. Mather
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- University of East Anglia, Norwich, UK
| | | | - Thierry Naas
- French-NRC for CPEs, Service de Bactériologie-Hygiène, Hôpital de Bicêtre, Le Kremlin-Bicêtre, France
| | | | - Silke Peter
- Institute of Medical Microbiology and Hygiene, University of Tübingen, Tübingen, Germany
| | - Arthur Pightling
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| | | | - John Rossen
- Department of Medical Microbiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | | | - Robert Schlaberg
- Department of Pathology, University of Utah, Salt Lake City, UT, USA
| | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Lukas M. Weber
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Present address: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | |
Collapse
|
17
|
MacDonald ML, Lee KH. EvalDNA: a machine learning-based tool for the comprehensive evaluation of mammalian genome assembly quality. BMC Bioinformatics 2021; 22:570. [PMID: 34837948 PMCID: PMC8627028 DOI: 10.1186/s12859-021-04480-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 11/15/2021] [Indexed: 11/16/2022] Open
Abstract
Background To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment. Results EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study. Conclusions EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04480-2.
Collapse
Affiliation(s)
- Madolyn L MacDonald
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, 19711, USA.,Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave., Newark, 19716, USA.,Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, 19711, USA
| | - Kelvin H Lee
- Delaware Biotechnology Institute, University of Delaware, 15 Innovation Way, Newark, 19711, USA. .,Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, 19716, USA.
| |
Collapse
|
18
|
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021; 22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open
Abstract
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Collapse
Affiliation(s)
- Mohammed Alser
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Jeremy Rotman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Dhrithi Deshpande
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Victor Xue
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA
- Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA
- Biology Department, Pennsylvania State University, University Park, PA, 16801, USA
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Can Alkan
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
| | - Onur Mutlu
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
19
|
Dida F, Yi G. Empirical evaluation of methods for de novo genome assembly. PeerJ Comput Sci 2021; 7:e636. [PMID: 34307867 PMCID: PMC8279138 DOI: 10.7717/peerj-cs.636] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 06/19/2021] [Indexed: 06/12/2023]
Abstract
Technologies for next-generation sequencing (NGS) have stimulated an exponential rise in high-throughput sequencing projects and resulted in the development of new read-assembly algorithms. A drastic reduction in the costs of generating short reads on the genomes of new organisms is attributable to recent advances in NGS technologies such as Ion Torrent, Illumina, and PacBio. Genome research has led to the creation of high-quality reference genomes for several organisms, and de novo assembly is a key initiative that has facilitated gene discovery and other studies. More powerful analytical algorithms are needed to work on the increasing amount of sequence data. We make a thorough comparison of the de novo assembly algorithms to allow new users to clearly understand the assembly algorithms: overlap-layout-consensus and de-Bruijn-graph, string-graph based assembly, and hybrid approach. We also address the computational efficacy of each algorithm's performance, challenges faced by the assem- bly tools used, and the impact of repeats. Our results compare the relative performance of the different assemblers and other related assembly differences with and without the reference genome. We hope that this analysis will contribute to further the application of de novo sequences and help the future growth of assembly algorithms.
Collapse
Affiliation(s)
- Firaol Dida
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| | - Gangman Yi
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| |
Collapse
|
20
|
Bai Y, Lin W, Xu J, Song J, Yang D, Chen YE, Li L, Li Y, Wang Z, Zhang J. Improving the genome assembly of rabbits with long-read sequencing. Genomics 2021; 113:3216-3223. [PMID: 34051323 DOI: 10.1016/j.ygeno.2021.05.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 10/21/2022]
Abstract
The European rabbit (Oryctolagus cuniculus) is important as a biomedical model given its unique features in immunity and metabolism. The current reference genome OryCun2.0 established with whole-genome shotgun sequencing was quite fragmented and had not been updated for ten years. In this work, we provided a new rabbit genome assembly UM_NZW_1.0 to improve OryCun2.0 by leveraging the contig lengths based on long-read sequencing and a wealth of available Illumina paired-end sequence data. UM_NZW_1.0 showed a remarkable increase of continuity compared with OryCun2.0, with 5 times longer contig N50 and approximately 75% gaps closed. Many of the closed gaps were overlapped with protein-coding genes or transcriptional features, resulting in an enhancement of gene annotations. In particular, UM_NZW_1.0 presented a more complete landscape of the MHC region and the IGH locus, therefore provided a valuable resource for future researches on rabbits.
Collapse
Affiliation(s)
- Yiqin Bai
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Weili Lin
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jie Xu
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Jun Song
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Dongshan Yang
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Y Eugene Chen
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA
| | - Lin Li
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China; School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| | - Yixue Li
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China; School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, China.
| | - Zhen Wang
- Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Jifeng Zhang
- Center for Advanced Models for Translational Sciences and Therapeutics, University of Michigan Medical Center, Ann Arbor, MI, USA.
| |
Collapse
|
21
|
Petrillo M, Fabbri M, Kagkli DM, Querci M, Van den Eede G, Alm E, Aytan-Aktug D, Capella-Gutierrez S, Carrillo C, Cestaro A, Chan KG, Coque T, Endrullat C, Gut I, Hammer P, Kay GL, Madec JY, Mather AE, McHardy AC, Naas T, Paracchini V, Peter S, Pightling A, Raffael B, Rossen J, Ruppé E, Schlaberg R, Vanneste K, Weber LM, Westh H, Angers-Loustau A. A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing. F1000Res 2021; 10:80. [DOI: 10.12688/f1000research.39214.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2021] [Indexed: 01/12/2023] Open
Abstract
Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain “live” (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines’ implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community.
Collapse
|
22
|
Takayama J, Tadaka S, Yano K, Katsuoka F, Gocho C, Funayama T, Makino S, Okamura Y, Kikuchi A, Sugimoto S, Kawashima J, Otsuki A, Sakurai-Yageta M, Yasuda J, Kure S, Kinoshita K, Yamamoto M, Tamiya G. Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference. Nat Commun 2021; 12:226. [PMID: 33431880 PMCID: PMC7801658 DOI: 10.1038/s41467-020-20146-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 11/17/2020] [Indexed: 12/21/2022] Open
Abstract
The complete human genome sequence is used as a reference for next-generation sequencing analyses. However, some ethnic ancestries are under-represented in the reference genome (e.g., GRCh37) due to its bias toward European and African ancestries. Here, we perform de novo assembly of three Japanese male genomes using > 100× Pacific Biosciences long reads and Bionano Genomics optical maps per sample. We integrate the genomes using the major allele for consensus and anchor the scaffolds using genetic and radiation hybrid maps to reconstruct each chromosome. The resulting genome sequence, JG1, is contiguous, accurate, and carries the Japanese major allele at most loci. We adopt JG1 as the reference for confirmatory exome re-analyses of seven rare-disease Japanese families and find that re-analysis using JG1 reduces total candidate variant calls versus GRCh37 while retaining disease-causing variants. These results suggest that integrating multiple genomes from a single population can aid genome analyses of that population.
Collapse
Affiliation(s)
- Jun Takayama
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Shu Tadaka
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Kenji Yano
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Fumiki Katsuoka
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Chinatsu Gocho
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Takamitsu Funayama
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Satoshi Makino
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Yasunobu Okamura
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Atsuo Kikuchi
- Department of Pediatrics, Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Sachiyo Sugimoto
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Junko Kawashima
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Akihito Otsuki
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Mika Sakurai-Yageta
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
| | - Jun Yasuda
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Division of Molecular and Cellular Oncology, Miyagi Cancer Center Research Institute, 47-1, Nodayama, Medeshima-Shiode, Natori, Miyagi, 981-1293, Japan
| | - Shigeo Kure
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan
- Department of Pediatrics, Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Kengo Kinoshita
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki Aza-Aoba, Aoba-ku, Sendai, Miyagi, 980-8579, Japan.
| | - Masayuki Yamamoto
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
| | - Gen Tamiya
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8573, Japan.
- Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
- Tohoku University Graduate School of Medicine, 2-1, Seiryo-machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
| |
Collapse
|
23
|
Naranpanawa DNU, Chandrasekara CHWMRB, Bandaranayake PCG, Bandaranayake AU. Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists. Sci Rep 2020; 10:18236. [PMID: 33106560 PMCID: PMC7588437 DOI: 10.1038/s41598-020-75270-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 09/21/2020] [Indexed: 02/07/2023] Open
Abstract
Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.
Collapse
Affiliation(s)
- D N U Naranpanawa
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
- Postgraduate Institute of Science, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - C H W M R B Chandrasekara
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - P C G Bandaranayake
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - A U Bandaranayake
- Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, Peradeniya, 20400, Sri Lanka.
| |
Collapse
|
24
|
Characterization of Mobile Genetic Elements Using Long-Read Sequencing for Tracking Listeria monocytogenes from Food Processing Environments. Pathogens 2020; 9:pathogens9100822. [PMID: 33036450 PMCID: PMC7599586 DOI: 10.3390/pathogens9100822] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 09/26/2020] [Accepted: 10/01/2020] [Indexed: 02/02/2023] Open
Abstract
Recently developed nanopore sequencing technologies offer a unique opportunity to rapidly close the genome and to identify complete sequences of mobile genetic elements (MGEs). In this study, 17 isolates of Listeria monocytogenes (Lm) epidemic clone II (ECII) from seven ready-to-eat meat or poultry processing facilities, not known to be associated with outbreaks, were shotgun sequenced, and among them, five isolates were further subjected to long-read sequencing. Additionally, 26 genomes of Lm ECII isolates associated with three listeriosis outbreaks in the U.S. and South Africa were obtained from the National Center for Biotechnology Information (NCBI) database and analyzed to evaluate if MGEs may be used as a high-resolution genetic marker for identifying and sourcing the origin of Lm. The analyses identified four comK prophages in 11 non-outbreak isolates from four facilities and three comK prophages in 20 isolates associated with two outbreaks that occurred in the U.S. In addition, three different plasmids were identified among 10 non-outbreak isolates and 14 outbreak isolates. Each comK prophage and plasmid was conserved among the isolates sharing it. Different prophages from different facilities or outbreaks had significant genetic variations, possibly due to horizontal gene transfer. Phylogenetic analysis showed that isolates from the same facility or the same outbreak always closely clustered. The time of most recent common ancestor of the Lm ECII isolates was estimated to be in March 1816 with the average nucleotide substitution rate of 3.1 × 10−7 substitutions per site per year. This study showed that complete MGE sequences provide a good signal to determine the genetic relatedness of Lm isolates, to identify persistence or repeated contamination that occurred within food processing environment, and to study the evolutionary history among closely related isolates.
Collapse
|
25
|
Asalone KC, Ryan KM, Yamadi M, Cohen AL, Farmer WG, George DJ, Joppert C, Kim K, Mughal MF, Said R, Toksoz-Exley M, Bisk E, Bracht JR. Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Comput Biol 2020; 16:e1008104. [PMID: 32735589 PMCID: PMC7423139 DOI: 10.1371/journal.pcbi.1008104] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 08/12/2020] [Accepted: 06/29/2020] [Indexed: 12/13/2022] Open
Abstract
High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. In the genomic era, genomes must be reconstructed from fragments using computational methods, or assemblers. How do we know that a new genome assembly is correct? This is important because errors in assembly can lead to downstream problems in gene predictions and these inaccurate results can contaminate databases, affecting later comparative studies. A particular challenge occurs when a diploid organism inherits two highly divergent genome copies from its parents. While it is widely appreciated that this type of data is difficult for assemblers to handle properly, here we show that the process is prone to more errors than previously appreciated. Specifically, we document examples of regional expansion and collapse, affecting downstream gene prediction accuracy, but without changing the overall genome assembly size or other metrics of accuracy. Our results suggest that assembly evaluation methods should be altered to identify whether regional expansions and collapses are present in the genome assembly.
Collapse
Affiliation(s)
- Kathryn C. Asalone
- Biology Department, American University, Washington DC, United States of America
| | - Kara M. Ryan
- Biology Department, American University, Washington DC, United States of America
| | - Maryam Yamadi
- Biology Department, American University, Washington DC, United States of America
| | - Annastelle L. Cohen
- Biology Department, American University, Washington DC, United States of America
| | - William G. Farmer
- Biology Department, American University, Washington DC, United States of America
| | - Deborah J. George
- Biology Department, American University, Washington DC, United States of America
| | - Claudia Joppert
- Biology Department, American University, Washington DC, United States of America
| | - Kaitlyn Kim
- Biology Department, American University, Washington DC, United States of America
| | - Madeeha Froze Mughal
- Biology Department, American University, Washington DC, United States of America
| | - Rana Said
- Biology Department, American University, Washington DC, United States of America
| | - Metin Toksoz-Exley
- Mathematics and Statistics Department, American University, Washington DC, United States of America
| | - Evgeny Bisk
- Office of Information Technology, American University, Washington DC, United States of America
| | - John R. Bracht
- Biology Department, American University, Washington DC, United States of America
- * E-mail:
| |
Collapse
|
26
|
Zwe YH, Chin SF, Kohli GS, Aung KT, Yang L, Yuk HG. Whole genome sequencing (WGS) fails to detect antimicrobial resistance (AMR) from heteroresistant subpopulation of Salmonella enterica. Food Microbiol 2020; 91:103530. [PMID: 32539974 DOI: 10.1016/j.fm.2020.103530] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 03/28/2020] [Accepted: 04/22/2020] [Indexed: 10/24/2022]
Abstract
Due to rapidly falling costs, whole genome sequencing (WGS) is becoming an essential tool in the surveillance of antimicrobial resistance (AMR) in Salmonella enterica. Although there have been many recent works evaluating the accuracy of WGS in predicting AMR from a large number of Salmonella isolates, little attention has been devoted to deciphering the underlying causes of disagreement between the WGS genotype and experimentally determined AMR phenotype. This study analyzed the genomes of six S. enterica isolates previously obtained from raw chicken which exhibited disagreements between WGS genotype and AMR phenotype. A total of five WGS false negative predictions toward ampicillin, amoxicillin/clavulanate, colistin, and fosfomycin resistance were presented in conjunction with their corresponding empirical phenotypic and/or genetic evidence of heteroresistance. A further case study highlighting the inherent limitations of WGS to detect the underlying genetic mechanisms of colistin heteroresistance was presented. These findings implicate heteroresistance as an underlying cause for false negative WGS-based AMR predictions in S. enterica and suggest that widespread use of WGS in the surveillance of AMR in food isolates might severely underestimate true resistance rates.
Collapse
Affiliation(s)
- Ye Htut Zwe
- Department of Food Science and Technology, National University of Singapore, Singapore
| | - Seow Fong Chin
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore
| | - Gurjeet Singh Kohli
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore; Alfred Wegener-Institut Helmholtz-Zentrum für Polarund Meeresforschung, Bremerhaven, Germany
| | - Kyaw Thu Aung
- National Centre for Food Science, Singapore Food Agency, Singapore; School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liang Yang
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore; School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Hyun-Gyun Yuk
- Department of Food Science and Technology, Korea National University of Transportation, Jeungpyeong-gun, Chungbuk, Republic of Korea.
| |
Collapse
|
27
|
Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform 2020; 20:1140-1150. [PMID: 28968737 DOI: 10.1093/bib/bbx098] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 07/13/2017] [Indexed: 01/09/2023] Open
Abstract
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
Collapse
|
28
|
Ye F, Han Y, Zhu J, Li P, Zhang Q, Lin Y, Wang T, Lv H, Wang C, Wang C, Zhang J. First Identification of Human Adenovirus Subtype 21a in China With MinION and Illumina Sequencers. Front Genet 2020; 11:285. [PMID: 32318094 PMCID: PMC7155751 DOI: 10.3389/fgene.2020.00285] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 03/09/2020] [Indexed: 12/03/2022] Open
Abstract
Human adenoviruses (HAdVs) have been demonstrated to cause a diversity of diseases among children and adults. The circulation of human adenovirus type 21 (HAdV21) has been mainly documented within closed environments in several countries. Nonetheless, respiratory infections or outbreaks due to HAdV21 have never been reported in China. MinION and Illumina platforms were employed to identify the potential pathogen from a throat swab. Discrepancies between MinION and Illumina sequencing were validated and corrected via polymerase chain reaction (PCR). Genomic characterization and recombinant event detection were then performed. Among the 35,466 high-quality MinION reads, a total of 5,999 reads (16.91%) could be aligned to HAdV21 reference genomes (genome sizes ≈35.3 kb), among which 20 had a length of >30 kb. A genome sequence assembled from MinION reads was further classified as HAdV subtype 21a. Random downsampling revealed as few as 500 nanopore reads could cover ≥96.49% of current genome. Illumina sequencing displayed good consistency (pairwise nucleotide identity = 99.91%) with MinION sequencing but with 31 discrepancies that were further validated and confirmed by PCR coupled with Sanger sequencing. Restriction enzymes such as BamHI and KpnI were able to distinguish the present genome from HAdV21 prototype and HAdV21b. Phylogenetic analysis employing whole-genome sequences placed our genome with members only from subtype 21a. Common features among HAdV21a strains were identified, including polymorphisms discovered in penton and 100 kDa hexon assembly–associated proteins and a recombinant event in the E4 gene. Using MinION and Illumina sequencers, we identified the first HAdV21a strain from China, which could provide key genomic data for disease control and epidemiological investigations.
Collapse
Affiliation(s)
- Fuqiang Ye
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| | - Yifang Han
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| | - Juanjuan Zhu
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, China
| | - Peng Li
- Center for Infectious Disease Control, Center for Disease Control and Prevention of People's Liberation Army of China, Beijing, China
| | - Qi Zhang
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| | - Yanfeng Lin
- Center for Infectious Disease Control, Center for Disease Control and Prevention of People's Liberation Army of China, Beijing, China
| | - Taiwu Wang
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| | - Heng Lv
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| | - Changjun Wang
- Center for Infectious Disease Control, Center for Disease Control and Prevention of People's Liberation Army of China, Beijing, China
| | - Chunhui Wang
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| | - Jinhai Zhang
- Department of Disease Control and Prevention, Center for Disease Control and Prevention of Eastern Theater Command, Nanjing, China
| |
Collapse
|
29
|
Xu XW, Shao CW, Xu H, Zhou Q, You F, Wang N, Li WL, Li M, Chen SL. Draft genomes of female and male turbot Scophthalmus maximus. Sci Data 2020; 7:90. [PMID: 32165614 PMCID: PMC7067757 DOI: 10.1038/s41597-020-0426-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 02/20/2020] [Indexed: 01/14/2023] Open
Abstract
Turbot (Scophthalmus maximus) is a commercially important flatfish species in aquaculture. It has a drastic sexual dimorphism, with females growing faster than males. In the present study, we sequenced and de novo assembled female and male turbot genomes. The assembled female genome was 568 Mb (scaffold N50, 6.2 Mb, BUSCO 97.4%), and the male genome was 584 Mb (scaffold N50, 5.9 Mb, BUSCO 96.6%). Using two genetic maps, we anchored female scaffolds representing 535 Mb onto 22 chromosomes. Annotation of the female anchored genome identified 87.8 Mb transposon elements and 20,134 genes. We identified 17,936 gene families, of which 369 gene families were flatfish specific. Phylogenetic analysis showed that the turbot, Japanese flounder and Chinese tongue sole form a clade that diverged from other teleosts approximately 78 Mya. This report of female and male turbot draft genomes and annotated genes provides a new resource for identifying sex determination genes, elucidating the evolution of adaptive traits in flatfish and developing genetic techniques to increase the sustainability of turbot aquaculture.
Collapse
Affiliation(s)
- Xi-Wen Xu
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
- Key Lab of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao, China
| | - Chang-Wei Shao
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
- Key Lab of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao, China
| | - Hao Xu
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
- College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, China
| | - Qian Zhou
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
- Key Lab of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao, China
| | - Feng You
- Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China
| | - Na Wang
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
- Key Lab of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao, China
| | - Wen-Long Li
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
| | - Ming Li
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China
- College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, China
| | - Song-Lin Chen
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, China.
- Key Lab of Sustainable Development of Marine Fisheries, Ministry of Agriculture, Qingdao, China.
| |
Collapse
|
30
|
Ma C, Kingsford C. Detecting, Categorizing, and Correcting Coverage Anomalies of RNA-Seq Quantification. Cell Syst 2019; 9:589-599.e7. [PMID: 31786209 DOI: 10.1016/j.cels.2019.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 07/09/2019] [Accepted: 10/17/2019] [Indexed: 11/13/2022]
Abstract
Because of incomplete reference transcriptomes, incomplete sequencing bias models, or other modeling defects, algorithms to infer isoform expression from RNA sequencing (RNA-seq) sometimes do not accurately model expression. We present a computational method to detect instances where a quantification algorithm could not completely explain the input reads. Our approach identifies regions where the read coverage significantly deviates from expectation. We call these regions "expression anomalies." We further present a method to attribute their cause to either the incompleteness of the reference transcriptome or algorithmic mistakes. We detect anomalies for 30 GEUVADIS and 16 Human Body Map samples. By correcting anomalies when possible, we reduce the number of falsely predicted instances of differential expression. Anomalies that cannot be corrected are suspected to indicate the existence of isoforms unannotated by the reference. We detected 88 common anomalies of this type and find that they tend to have a lower-than-expected coverage toward their 3' ends.
Collapse
Affiliation(s)
- Cong Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA.
| |
Collapse
|
31
|
Í Kongsstovu S, Mikalsen SO, Homrum EÍ, Jacobsen JA, Flicek P, Dahl HA. Using long and linked reads to improve an Atlantic herring (Clupea harengus) genome assembly. Sci Rep 2019; 9:17716. [PMID: 31776409 PMCID: PMC6881392 DOI: 10.1038/s41598-019-54151-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 11/08/2019] [Indexed: 01/01/2023] Open
Abstract
Atlantic herring (Clupea harengus) is one of the most abundant fish species in the world. It is an important economical and nutritional resource, as well as a crucial part of the North Atlantic ecosystem. In 2016, a draft herring genome assembly was published. Being a species of such importance, we sought to independently verify and potentially improve the herring genome assembly. We sequenced the herring genome generating paired-end, mate-pair, linked and long reads. Three assembly versions of the herring genome were generated based on a de novo assembly (A1), which was scaffolded using linked and long reads (A2) and then merged with the previously published assembly (A3). The resulting assemblies were compared using parameters describing the size, fragmentation, correctness, and completeness of the assemblies. Results showed that the A2 assembly was less fragmented, more complete and more correct than A1. A3 showed improvement in fragmentation and correctness compared with A2 and the published assembly but was slightly less complete than the published assembly. Thus, we here confirmed the previously published herring assembly, and made improvements by further scaffolding the assembly and removing low-quality sequences using linked and long reads and merging of assemblies.
Collapse
Affiliation(s)
- Sunnvør Í Kongsstovu
- Amplexa Genetics A/S, Hoyvíksvegur 51, FO-100, Tórshavn, Faroe Islands. .,University of the Faroe Islands, Department of Science and Technology, Vestara Bryggja 15, FO-100, Tórshavn, Faroe Islands. .,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Svein-Ole Mikalsen
- University of the Faroe Islands, Department of Science and Technology, Vestara Bryggja 15, FO-100, Tórshavn, Faroe Islands
| | - Eydna Í Homrum
- Faroe Marine Research Institute, Nóatún 1, FO-100, Tórshavn, Faroe Islands
| | - Jan Arge Jacobsen
- Faroe Marine Research Institute, Nóatún 1, FO-100, Tórshavn, Faroe Islands
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Hans Atli Dahl
- Amplexa Genetics A/S, Hoyvíksvegur 51, FO-100, Tórshavn, Faroe Islands
| |
Collapse
|
32
|
Fan Y, Ye MS, Zhang JY, Xu L, Yu DD, Gu TL, Yao YL, Chen JQ, Lv LB, Zheng P, Wu DD, Zhang GJ, Yao YG. Chromosomal level assembly and population sequencing of the Chinese tree shrew genome. Zool Res 2019; 40:506-521. [PMID: 31418539 PMCID: PMC6822927 DOI: 10.24272/j.issn.2095-8137.2019.063] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 08/09/2019] [Indexed: 01/11/2023] Open
Abstract
Chinese tree shrews (Tupaia belangeri chinensis) have become an increasingly important experimental animal in biomedical research due to their close relationship to primates. An accurately sequenced and assembled genome is essential for understanding the genetic features and biology of this animal. In this study, we used long-read single-molecule sequencing and high-throughput chromosome conformation capture (Hi-C) technology to obtain a high-qualitychromosome-scale scaffolding of the Chinese tree shrew genome. The new reference genome (KIZ version 2: TS_2.0) resolved problems in presently available tree shrew genomes and enabled accurate identification of large and complex repeat regions, gene structures, and species-specific genomic structural variants. In addition, by sequencing the genomes of six Chinese tree shrew individuals, we produced a comprehensive map of 12.8 M single nucleotide polymorphisms and confirmed that the major histocompatibility complex (MHC) loci and immunoglobulin gene family exhibited high nucleotide diversity in the tree shrew genome. We updated the tree shrew genome database (TreeshrewDB v2.0: http://www.treeshrewdb.org) to include the genome annotation information and genetic variations. The new high-quality reference genome of the Chinese tree shrew and the updated TreeshrewDB will facilitate the use of this animal in many different fields of research.
Collapse
Affiliation(s)
- Yu Fan
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Mao-Sen Ye
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming Yunnan 650204, China
| | - Jin-Yan Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming Yunnan 650204, China
| | - Ling Xu
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Dan-Dan Yu
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Tian-Le Gu
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming Yunnan 650204, China
| | - Yu-Lin Yao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming Yunnan 650204, China
| | - Jia-Qi Chen
- Kunming Primate Research Center of the Chinese Academy of Sciences, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Long-Bao Lv
- Kunming Primate Research Center of the Chinese Academy of Sciences, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Ping Zheng
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming Yunnan 650204, China
- Kunming Primate Research Center of the Chinese Academy of Sciences, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Yunnan Key Laboratory of Animal Reproduction, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Dong-Dong Wu
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Guo-Jie Zhang
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| | - Yong-Gang Yao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China, E-mail:
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming Yunnan 650204, China
- Kunming Primate Research Center of the Chinese Academy of Sciences, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming Yunnan 650223, China
| |
Collapse
|
33
|
Horizontal acquisition of a patchwork Calvin cycle by symbiotic and free-living Campylobacterota (formerly Epsilonproteobacteria). ISME JOURNAL 2019; 14:104-122. [PMID: 31562384 PMCID: PMC6908604 DOI: 10.1038/s41396-019-0508-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Revised: 08/06/2019] [Accepted: 08/15/2019] [Indexed: 11/30/2022]
Abstract
Most autotrophs use the Calvin–Benson–Bassham (CBB) cycle for carbon fixation. In contrast, all currently described autotrophs from the Campylobacterota (previously Epsilonproteobacteria) use the reductive tricarboxylic acid cycle (rTCA) instead. We discovered campylobacterotal epibionts (“Candidatus Thiobarba”) of deep-sea mussels that have acquired a complete CBB cycle and may have lost most key genes of the rTCA cycle. Intriguingly, the phylogenies of campylobacterotal CBB cycle genes suggest they were acquired in multiple transfers from Gammaproteobacteria closely related to sulfur-oxidizing endosymbionts associated with the mussels, as well as from Betaproteobacteria. We hypothesize that “Ca. Thiobarba” switched from the rTCA cycle to a fully functional CBB cycle during its evolution, by acquiring genes from multiple sources, including co-occurring symbionts. We also found key CBB cycle genes in free-living Campylobacterota, suggesting that the CBB cycle may be more widespread in this phylum than previously known. Metatranscriptomics and metaproteomics confirmed high expression of CBB cycle genes in mussel-associated “Ca. Thiobarba”. Direct stable isotope fingerprinting showed that “Ca. Thiobarba” has typical CBB signatures, suggesting that it uses this cycle for carbon fixation. Our discovery calls into question current assumptions about the distribution of carbon fixation pathways in microbial lineages, and the interpretation of stable isotope measurements in the environment.
Collapse
|
34
|
Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 2019; 19:329-346. [PMID: 29599501 DOI: 10.1038/s41576-018-0003-4] [Citation(s) in RCA: 289] [Impact Index Per Article: 57.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hayan Lee
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Charlotte A Darby
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
35
|
Lam TJ, Ye Y. Long reads reveal the diversification and dynamics of CRISPR reservoir in microbiomes. BMC Genomics 2019; 20:567. [PMID: 31288753 PMCID: PMC6617893 DOI: 10.1186/s12864-019-5922-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 06/21/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Sequencing of microbiomes has accelerated the characterization of the diversity of CRISPR-Cas immune systems. However, the utilization of next generation short read sequences for the characterization of CRISPR-Cas dynamics remains limited due to the repetitive nature of CRISPR arrays. CRISPR arrays are comprised of short spacer segments (derived from invaders' genomes) interspaced between flanking repeat sequences. The repetitive structure of CRISPR arrays poses a computational challenge for the accurate assembly of CRISPR arrays from short reads. In this paper we evaluate the use of long read sequences for the analysis of CRISPR-Cas system dynamics in microbiomes. RESULTS We analyzed a dataset of Illumina's TruSeq Synthetic Long-Reads (SLR) derived from a gut microbiome. We showed that long reads captured CRISPR spacers at a high degree of redundancy, which highlights the spacer conservation of spacer sharing CRISPR variants, enabling the study of CRISPR array dynamics in ways difficult to achieve though short read sequences. We introduce compressed spacer graphs, a visual abstraction of spacer sharing CRISPR arrays, to provide a simplified view of complex organizational structures present within CRISPR array dynamics. Utilizing compressed spacer graphs, several key defining characteristics of CRISPR-Cas system dynamics were observed including spacer acquisition and loss events, conservation of the trailer end spacers, and CRISPR arrays' directionality (transcription orientation). Other result highlights include the observation of intense array contraction and expansion events, and reconstruction of a full-length genome for a potential invader (Faecalibacterium phage) based on identified spacers. CONCLUSION We demonstrate in an in silico system that long reads provide the necessary context for characterizing the organization of CRISPR arrays in a microbiome, and reveal dynamic and evolutionary features of CRISPR-Cas systems in a microbial population.
Collapse
Affiliation(s)
- Tony J Lam
- School of Informatics, Computing, and Engineering Indiana University, Bloomington, 47408, IN, USA
| | - Yuzhen Ye
- School of Informatics, Computing, and Engineering Indiana University, Bloomington, 47408, IN, USA.
| |
Collapse
|
36
|
Su C, Weir JD, Zhang F, Yan H, Wu T. ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics 2019; 20:373. [PMID: 31269893 PMCID: PMC6610807 DOI: 10.1186/s12859-019-2948-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/12/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA molecules play many crucial roles in living systems. The spatial complexity that exists in RNA structures determines their cellular functions. Therefore, understanding RNA folding conformations, in particular, RNA secondary structures, is critical for elucidating biological functions. Existing literature has focused on RNA design as either an RNA structure prediction problem or an RNA inverse folding problem where free energy has played a key role. RESULTS In this research, we propose a Positive-Unlabeled data- driven framework termed ENTRNA. Other than free energy and commonly studied sequence and structural features, we propose a new feature, Sequence Segment Entropy (SSE), to measure the diversity of RNA sequences. ENTRNA is trained and cross-validated using 1024 pseudoknot-free RNAs and 1060 pseudoknotted RNAs from the RNASTRAND database respectively. To test the robustness of the ENTRNA, the models are further blind tested on 206 pseudoknot-free and 93 pseudoknotted RNAs from the PDB database. For pseudoknot-free RNAs, ENTRNA has 86.5% sensitivity on the training dataset and 80.6% sensitivity on the testing dataset. For pseudoknotted RNAs, ENTRNA shows 81.5% sensitivity on the training dataset and 71.0% on the testing dataset. To test the applicability of ENTRNA to long structural-complex RNA, we collect 5 laboratory synthetic RNAs ranging from 1618 to 1790 nucleotides. ENTRNA is able to predict the foldability of 4 RNAs. CONCLUSION In this article, we reformulate the RNA design problem as a foldability prediction problem which is to predict the likelihood of the co-existence of a sequence-structure pair. This new construct has the potential for both RNA structure prediction and the inverse folding problem. In addition, this new construct enables us to explore data-driven approaches in RNA research.
Collapse
Affiliation(s)
- Congzhe Su
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA
| | - Jeffery D. Weir
- Department of Operational Sciences, Graduate School of Engineering and Management, Air Force Institute of Technology, Wright-Patterson AFB, Dayton, OH 45433 USA
| | - Fei Zhang
- Biodesign Center for Molecular Design and Biomimetics, The Biodesign Institute & School of Molecular Sciences, Arizona State University, Tempe, AZ 85281 USA
| | - Hao Yan
- Biodesign Center for Molecular Design and Biomimetics, The Biodesign Institute & School of Molecular Sciences, Arizona State University, Tempe, AZ 85281 USA
| | - Teresa Wu
- School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA
| |
Collapse
|
37
|
Marijon P, Chikhi R, Varré JS. Graph analysis of fragmented long-read bacterial genome assemblies. Bioinformatics 2019; 35:4239-4246. [DOI: 10.1093/bioinformatics/btz219] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 02/19/2019] [Accepted: 03/26/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost.
Results
We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies.
Availability and implementation
https://gitlab.inria.fr/pmarijon/knot .
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pierre Marijon
- Inria, Université de Lille, CNRS, Centrale Lille, UMR 9189 – CRIStAL, Lille F-59000, France
| | - Rayan Chikhi
- Institut Pasteur, C3BI USR 3756 IP CNRS, Paris, France
| | - Jean-Stéphane Varré
- Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 – CRIStAL, Lille F-59000, France
| |
Collapse
|
38
|
|
39
|
McGowan J, Byrne KP, Fitzpatrick DA. Comparative Analysis of Oomycete Genome Evolution Using the Oomycete Gene Order Browser (OGOB). Genome Biol Evol 2019; 11:189-206. [PMID: 30535146 PMCID: PMC6330052 DOI: 10.1093/gbe/evy267] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2018] [Indexed: 01/01/2023] Open
Abstract
The oomycetes are a class of microscopic, filamentous eukaryotes within the stramenopiles–alveolates–rhizaria eukaryotic supergroup. They include some of the most destructive pathogens of animals and plants, such as Phytophthora infestans, the causative agent of late potato blight. Despite the threat they pose to worldwide food security and natural ecosystems, there is a lack of tools and databases available to study oomycete genetics and evolution. To this end, we have developed the Oomycete Gene Order Browser (OGOB), a curated database that facilitates comparative genomic and syntenic analyses of oomycete species. OGOB incorporates genomic data for 20 oomycete species including functional annotations and a number of bioinformatics tools. OGOB hosts a robust set of orthologous oomycete genes for evolutionary analyses. Here, we present the structure and function of OGOB as well as a number of comparative genomic analyses we have performed to better understand oomycete genome evolution. We analyze the extent of oomycete gene duplication and identify tandem gene duplication as a driving force of the expansion of secreted oomycete genes. We identify core genes that are present and microsyntenically conserved (termed syntenologs) in oomycete lineages and identify the degree of microsynteny between each pair of the 20 species housed in OGOB. Consistent with previous comparative synteny analyses between a small number of oomycete species, our results reveal an extensive degree of microsyntenic conservation amongst genes with housekeeping functions within the oomycetes. OGOB is available at https://ogob.ie.
Collapse
Affiliation(s)
- Jamie McGowan
- Genome Evolution Laboratory, Department of Biology, Maynooth University, Co. Kildare, Ireland.,Human Health Research Institute, Maynooth University, Co. Kildare, Ireland
| | - Kevin P Byrne
- School of Medicine, UCD Conway Institute, University College Dublin, Ireland
| | - David A Fitzpatrick
- Genome Evolution Laboratory, Department of Biology, Maynooth University, Co. Kildare, Ireland.,Human Health Research Institute, Maynooth University, Co. Kildare, Ireland
| |
Collapse
|
40
|
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 2019; 8:5256637. [PMID: 30576505 PMCID: PMC6324547 DOI: 10.1093/gigascience/giy157] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 11/27/2018] [Indexed: 02/05/2023] Open
Abstract
Background Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. Findings We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. Conclusions LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Collapse
Affiliation(s)
- Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China.,College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Tian-Jun Xu
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Rui Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China.,College of Fisheries and Life Science, Shanghai Ocean University, 999 Huchenghuan Road, Shanghai, 201306, China
| | - Yan Zhang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Shang-Qi Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Hong-Wei Wang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| |
Collapse
|
41
|
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 2019. [PMID: 30576505 DOI: 10.5524/100540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Collapse
Affiliation(s)
- Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Tian-Jun Xu
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Rui Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Fisheries and Life Science, Shanghai Ocean University, 999 Huchenghuan Road, Shanghai, 201306, China
| | - Yan Zhang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Shang-Qi Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Hong-Wei Wang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| |
Collapse
|
42
|
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 2019. [PMID: 30576505 DOI: 10.1093/gigascience/giy157/5256637] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023] Open
Abstract
BACKGROUND Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Collapse
Affiliation(s)
- Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Tian-Jun Xu
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Rui Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Fisheries and Life Science, Shanghai Ocean University, 999 Huchenghuan Road, Shanghai, 201306, China
| | - Yan Zhang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Shang-Qi Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Hong-Wei Wang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| |
Collapse
|
43
|
Boncan DAT, David AME, Lluisma AO. A CAZyme-Rich Genome of a Taxonomically Novel Rhodophyte-Associated Carrageenolytic Marine Bacterium. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2018; 20:685-705. [PMID: 29936557 DOI: 10.1007/s10126-018-9840-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 06/07/2018] [Indexed: 06/08/2023]
Abstract
Carbohydrate-active enzymes (CAZymes) have significant biotechnological potential as agents for degradation or modification of polysaccharides/glycans. As marine macroalgae are known to be rich in various types of polysaccharides, seaweed-associated bacteria are likely to be a good source of these CAZymes. A genomics approach can be used to explore CAZyme abundance and diversity, but it can also provide deep insights into the biology of CAZyme producers and, in particular, into molecular mechanisms that mediate their interaction with their hosts. In this study, a Gram-negative, aerobic, rod-shaped, carrageenolytic, and culturable marine bacterium designated as AOL6 was isolated from a diseased thallus of a carrageenan-producing farmed rhodophyte, Kappaphycus alvarezii (Gigartinales, Rhodophyta). The whole genome of this bacterium was sequenced and characterized. Sequence reads were assembled producing a high-quality genome assembly. The estimated genome size of the bacterium is 4.4 Mb and a G+C content of 52%. Molecular phylogenetic analysis based on a complete sequence of 16S rRNA, rpoB, and a set of 38 single-copy genes suggests that the bacterium is an unknown species and represents a novel genus in the family Cellvibrionaceae that is most closely related to the genera Teredinibacter and Saccharophagus. Genome comparison with T. turnerae T7901 and S. degradans 2-40 reveals several features shared by the three species, including a large number of CAZymes that comprised > 5% of the total number of protein-coding genes. The high proportion of CAZymes found in the AOL6 genome exceeds that of other known carbohydrate degraders, suggesting a significant capacity to degrade a range of polysaccharides including κ-carrageenan; 34% of these CAZymes have signal peptide sequences for secretion. Three putative κ-carrageenase-encoding genes were identified from the genome of the bacterium via in silico analysis, consistent with the results of the zymography assay (with κ-carrageenan as substrate). Genome analysis also indicated that AOL6 relies exclusively on type 2 secretion system (T2SS) for secreting proteins (possibly including glycoside hydrolases). In relation to T2SS, the product of the pilZ gene was predicted to be highly expressed, suggesting specialization for cell adhesion and secretion of virulence factors. The assignment of proteins to clusters of orthologous groups (COGs) revealed a pattern characteristic of r-strategists. Majority of two-component system proteins identified in the AOL6 genome were also predicted to be involved in chemotaxis and surface colonization. These genomic features suggest that AOL6 is an opportunistic pathogen, adapted to colonizing polysaccharide-rich hosts, including carrageenophytes.
Collapse
Affiliation(s)
- Delbert Almerick T Boncan
- Marine Science Institute, College of Science, University of the Philippines Diliman, 1101, Quezon City, Philippines
- National Institute of Molecular Biology and Biotechnology, College of Science, University of the Philippines Diliman, 1101, Quezon City, Philippines
| | - Anne Marjorie E David
- Marine Science Institute, College of Science, University of the Philippines Diliman, 1101, Quezon City, Philippines
- Institute of Biology, College of Science, University of the Philippines Diliman, 1101, Quezon City, Philippines
| | - Arturo O Lluisma
- Marine Science Institute, College of Science, University of the Philippines Diliman, 1101, Quezon City, Philippines.
| |
Collapse
|
44
|
Wu B, Li M, Liao X, Luo J, Wu F, Pan Y, Wang J. MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 17:847-857. [PMID: 30334805 DOI: 10.1109/tcbb.2018.2876855] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The de novo assembly tools aim at reconstructing genomes from next-generation sequencing (NGS) data. However, the assembly tools usually generate a large amount of contigs containing many misassemblies, which are caused by problems of repetitive regions, chimeric reads and sequencing errors. As they can improve the accuracy of assembly results, detecting and correcting the misassemblies in contigs are appealing, yet challenging. In this study, a novel method, called MEC, is proposed to identify and correct misassemblies in contigs. Based on the insert size distribution of paired-end reads and the statistical analysis of GC-contents, MEC can identify more misassemblies accurately. We evaluate our MEC with the metrics (NA50, NGA50) on four datasets, compared it with the most available misassembly correction tools, and carry out experiments to analyze the influence of MEC on scaffolding results, which shows that MEC can reduce misassemblies effectively and result in quantitative improvements in scaffolding quality. MEC is publicly available at https://github.com/bioinfomaticsCSU/MEC.
Collapse
|
45
|
Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 2018; 33:2202-2204. [PMID: 28369201 DOI: 10.1093/bioinformatics/btx153] [Citation(s) in RCA: 952] [Impact Index Per Article: 158.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Accepted: 03/17/2017] [Indexed: 02/03/2023] Open
Abstract
Summary GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates. Availability and Implementation http://genomescope.org , https://github.com/schatzlab/genomescope.git . Contact mschatz@jhu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gregory W Vurture
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Fritz J Sedlazeck
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Maria Nattestad
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Charles J Underwood
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Han Fang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
| | - James Gurtowski
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
46
|
Biscotti MA, Barucca M, Canapa A. New insights into the genome repetitive fraction of the Antarctic bivalve Adamussium colbecki. PLoS One 2018; 13:e0194502. [PMID: 29590185 PMCID: PMC5874043 DOI: 10.1371/journal.pone.0194502] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 03/05/2018] [Indexed: 11/29/2022] Open
Abstract
Repetitive DNA represents the major component of the genome in both plant and animal species. It includes transposable elements (TEs), which are dispersed throughout the genome, and satellite DNAs (satDNAs), which are tandemly organized in long arrays. The study of the structure and organization of repetitive DNA contributes to our understanding of genome architecture and the mechanisms leading to its evolution. Molluscs represent one of the largest groups of invertebrates and include organisms with a wide variety of morphologies and lifestyles. To increase our knowledge of bivalves at the genome level, we analysed the Antarctic scallop Adamussium colbecki. The screening of the genomic library evidenced the presence of two novel satDNA elements and the CvA transposon. The interspecific investigation performed in this study demonstrated that one of the two satDNAs isolated in A. colbecki is widespread in polar molluscan species, indicating a possible link between repetitive DNA and abiotic factors. Moreover, the transcriptional activity of CvA and its presence in long-diverged bivalves suggests a possible role for this ancient element in shaping the genome architecture of this clade.
Collapse
Affiliation(s)
- Maria Assunta Biscotti
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona, Italy
| | - Marco Barucca
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona, Italy
| | - Adriana Canapa
- Dipartimento di Scienze della Vita e dell'Ambiente, Università Politecnica delle Marche, Ancona, Italy
| |
Collapse
|
47
|
Darracq A, Vitte C, Nicolas S, Duarte J, Pichon JP, Mary-Huard T, Chevalier C, Bérard A, Le Paslier MC, Rogowsky P, Charcosset A, Joets J. Sequence analysis of European maize inbred line F2 provides new insights into molecular and chromosomal characteristics of presence/absence variants. BMC Genomics 2018; 19:119. [PMID: 29402214 PMCID: PMC5800051 DOI: 10.1186/s12864-018-4490-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 01/22/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Maize is well known for its exceptional structural diversity, including copy number variants (CNVs) and presence/absence variants (PAVs), and there is growing evidence for the role of structural variation in maize adaptation. While PAVs have been described in this important crop species, they have been only scarcely characterized at the sequence level and the extent of presence/absence variation and relative chromosomal landscape of inbred-specific regions remain to be elucidated. RESULTS De novo genome sequencing of the French F2 maize inbred line revealed 10,044 novel genomic regions larger than 1 kb, making up 88 Mb of DNA, that are present in F2 but not in B73 (PAV). This set of maize PAV sequences allowed us to annotate PAV content and to analyze sequence breakpoints. Using PAV genotyping on a collection of 25 temperate lines, we also analyzed Linkage Disequilibrium in PAVs and flanking regions, and PAV frequencies within maize genetic groups. CONCLUSIONS We highlight the possible role of MMEJ-type double strand break repair in maize PAV formation and discover 395 new genes with transcriptional support. Pattern of linkage disequilibrium within PAVs strikingly differs from this of flanking regions and is in accordance with the intuition that PAVs may recombine less than other genomic regions. We show that most PAVs are ancient, while some are found only in European Flint material, thus pinpointing structural features that may be at the origin of adaptive traits involved in the success of this material. Characterization of such PAVs will provide useful material for further association genetic studies in European and temperate maize.
Collapse
Affiliation(s)
- Aude Darracq
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Clémentine Vitte
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Stéphane Nicolas
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | | | | | - Tristan Mary-Huard
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
- MIA, INRA, AgroParisTech, Université Paris-Saclay, Paris, France
| | - Céline Chevalier
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Aurélie Bérard
- EPGV US 1279, INRA, CEA, IG-CNG, Université Paris-Saclay, Evry, France
| | | | - Peter Rogowsky
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRA, Lyon, France
| | - Alain Charcosset
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Johann Joets
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| |
Collapse
|
48
|
Dominguez Del Angel V, Hjerde E, Sterck L, Capella-Gutierrez S, Notredame C, Vinnere Pettersson O, Amselem J, Bouri L, Bocs S, Klopp C, Gibrat JF, Vlasova A, Leskosek BL, Soler L, Binzer-Panchal M, Lantz H. Ten steps to get started in Genome Assembly and Annotation. F1000Res 2018; 7. [PMID: 29568489 PMCID: PMC5850084 DOI: 10.12688/f1000research.13598.1] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/19/2018] [Indexed: 12/16/2022] Open
Abstract
As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).
Collapse
Affiliation(s)
| | - Erik Hjerde
- Department of Chemistry, Norstruct, UiT The Arctic University of Norway, Tromsø, 9019, Norway
| | - Lieven Sterck
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Ghent, Belgium.,VIB-UGent Center for Plant Systems Biology, Ghent University - VIB, Technologiepark 927, 9052 Ghent, Belgium
| | - Salvadors Capella-Gutierrez
- Spanish National Bioinformatics Institute (INB), Barcelona, Spain.,Barcelona Supercomputing Center (BSC), Centro Nacional de Supercomputación, Barcelona, Spain
| | - Cederic Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology , Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Olga Vinnere Pettersson
- Uppsala Genome Center, NGI/SciLifeLab, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, SE-752 37 , Sweden
| | - Joelle Amselem
- URGI, INRA, Université Paris-Saclay, Versailles, 78026, France
| | - Laurent Bouri
- Institut Français de Bioinformatique, UMS3601-CNRS, Université Paris-Saclay, Orsay, 91403, France
| | - Stephanie Bocs
- CIRAD, UMR AGAP, Montpellier, 34398, France.,AGAP, Cirad, INRA, Montpellier SupAgro, Universite Montpellier, Montpellier, France.,South Green Bioinformatics Platform, Montpellier, France
| | | | - Jean-Francois Gibrat
- Institut Français de Bioinformatique, UMS3601-CNRS, Université Paris-Saclay, Orsay, 91403, France.,Unité de recherche , INRA, Université Paris-Saclay, 78350 Jouy-en-Josas, France
| | - Anna Vlasova
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Brane L Leskosek
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Lucile Soler
- IMBIM/NBIS/SciLifeLab, Uppsala University, Uppsala, Sweden
| | | | - Henrik Lantz
- IMBIM/NBIS/SciLifeLab, Uppsala University, Uppsala, Sweden
| |
Collapse
|
49
|
Ko YJ, Kim JS, Kim S. misMM: An Integrated Pipeline for Misassembly Detection Using Genotyping-by-Sequencing and Its Validation with BAC End Library Sequences and Gene Synteny. Genomics Inform 2017; 15:128-135. [PMID: 29307138 PMCID: PMC5769862 DOI: 10.5808/gi.2017.15.4.128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 11/02/2017] [Indexed: 11/25/2022] Open
Abstract
As next-generation sequencing technologies have advanced, enormous amounts of whole-genome sequence information in various species have been released. However, it is still difficult to assemble the whole genome precisely, due to inherent limitations of short-read sequencing technologies. In particular, the complexities of plants are incomparable to those of microorganisms or animals because of whole-genome duplications, repeat insertions, and Numt insertions, etc. In this study, we describe a new method for detecting misassembly sequence regions of Brassica rapa with genotyping-by-sequencing, followed by MadMapper clustering. The misassembly candidate regions were cross-checked with BAC clone paired-ends library sequences that have been mapped to the reference genome. The results were further verified with gene synteny relations between Brassica rapa and Arabidopsis thaliana. We conclude that this method will help detect misassembly regions and be applicable to incompletely assembled reference genomes from a variety of species.
Collapse
Affiliation(s)
- Young-Joon Ko
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Korea
| | - Jung Sun Kim
- Genomics Division, Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea
| | - Sangsoo Kim
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Korea
- Corresponding author: Tel: +82-2-820-0457, Fax: +82-2-824-4383, E-mail:
| |
Collapse
|
50
|
Abstract
Background Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. Results We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Conclusions Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
Collapse
|