1
|
Dida F, Yi G. Empirical evaluation of methods for de novo genome assembly. PeerJ Comput Sci 2021; 7:e636. [PMID: 34307867 PMCID: PMC8279138 DOI: 10.7717/peerj-cs.636] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 06/19/2021] [Indexed: 06/12/2023]
Abstract
Technologies for next-generation sequencing (NGS) have stimulated an exponential rise in high-throughput sequencing projects and resulted in the development of new read-assembly algorithms. A drastic reduction in the costs of generating short reads on the genomes of new organisms is attributable to recent advances in NGS technologies such as Ion Torrent, Illumina, and PacBio. Genome research has led to the creation of high-quality reference genomes for several organisms, and de novo assembly is a key initiative that has facilitated gene discovery and other studies. More powerful analytical algorithms are needed to work on the increasing amount of sequence data. We make a thorough comparison of the de novo assembly algorithms to allow new users to clearly understand the assembly algorithms: overlap-layout-consensus and de-Bruijn-graph, string-graph based assembly, and hybrid approach. We also address the computational efficacy of each algorithm's performance, challenges faced by the assem- bly tools used, and the impact of repeats. Our results compare the relative performance of the different assemblers and other related assembly differences with and without the reference genome. We hope that this analysis will contribute to further the application of de novo sequences and help the future growth of assembly algorithms.
Collapse
Affiliation(s)
- Firaol Dida
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| | - Gangman Yi
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| |
Collapse
|
2
|
Guo G, Chen H, Yan D, Cheng J, Chen JY, Chong Z. Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:731-744. [PMID: 31180898 DOI: 10.1109/tcbb.2019.2920912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular de Bruijn graph based approach for sequencing, and each operation is implemented as a program in Google's Pregel framework which can be easily deployed in a generic cluster. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts while providing comparable sequencing quality. PPA-assembler has been open-sourced at https://github.com/yaobaiwei/PPA-Assembler.
Collapse
|
3
|
GAAP: A Genome Assembly + Annotation Pipeline. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4767354. [PMID: 31346518 PMCID: PMC6617929 DOI: 10.1155/2019/4767354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/20/2019] [Accepted: 05/26/2019] [Indexed: 12/24/2022]
Abstract
Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.
Collapse
|
4
|
Lee CY, Hsieh PH, Chiang LM, Chattopadhyay A, Li KY, Lee YF, Lu TP, Lai LC, Lin EC, Lee H, Ding ST, Tsai MH, Chen CY, Chuang EY. Whole-genome de novo sequencing reveals unique genes that contributed to the adaptive evolution of the Mikado pheasant. Gigascience 2018; 7:4990948. [PMID: 29722814 PMCID: PMC5941149 DOI: 10.1093/gigascience/giy044] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 04/13/2018] [Indexed: 01/10/2023] Open
Abstract
Background The Mikado pheasant (Syrmaticus mikado) is a nearly endangered species indigenous to high-altitude regions of Taiwan. This pheasant provides an opportunity to investigate evolutionary processes following geographic isolation. Currently, the genetic background and adaptive evolution of the Mikado pheasant remain unclear. Results We present the draft genome of the Mikado pheasant, which consists of 1.04 Gb of DNA and 15,972 annotated protein-coding genes. The Mikado pheasant displays expansion and positive selection of genes related to features that contribute to its adaptive evolution, such as energy metabolism, oxygen transport, hemoglobin binding, radiation response, immune response, and DNA repair. To investigate the molecular evolution of the major histocompatibility complex (MHC) across several avian species, 39 putative genes spanning 227 kb on a contiguous region were annotated and manually curated. The MHC loci of the pheasant revealed a high level of synteny, several rapidly evolving genes, and inverse regions compared to the same loci in the chicken. The complete mitochondrial genome was also sequenced, assembled, and compared against four long-tailed pheasants. The results from molecular clock analysis suggest that ancestors of the Mikado pheasant migrated from the north to Taiwan about 3.47 million years ago. Conclusions This study provides a valuable genomic resource for the Mikado pheasant, insights into its adaptation to high altitude, and the evolutionary history of the genus Syrmaticus, which could potentially be useful for future studies that investigate molecular evolution, genomics, ecology, and immunogenetics.
Collapse
Affiliation(s)
- Chien-Yueh Lee
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan
| | - Ping-Han Hsieh
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan
| | - Li-Mei Chiang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan
| | - Amrita Chattopadhyay
- Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Kuan-Yi Li
- Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan.,Institute of Plant and Microbial Biology, Academia Sinica, Taipei, 11529, Taiwan
| | - Yi-Fang Lee
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10055, Taiwan
| | - Liang-Chuan Lai
- Graduate Institute of Physiology, National Taiwan University, Taipei 10051, Taiwan
| | - En-Chung Lin
- Department of Animal Science and Technology, National Taiwan University, Taipei 10617, Taiwan
| | - Hsinyu Lee
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan.,Department of Life Science, National Taiwan University, Taipei 10617, Taiwan.,Center for Biotechnology, National Taiwan University, Taipei 10672, Taiwan
| | - Shih-Torng Ding
- Department of Animal Science and Technology, National Taiwan University, Taipei 10617, Taiwan.,Center for Biotechnology, National Taiwan University, Taipei 10672, Taiwan
| | - Mong-Hsun Tsai
- Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei 10055, Taiwan.,Center for Biotechnology, National Taiwan University, Taipei 10672, Taiwan.,Institute of Biotechnology, National Taiwan University, Taipei 10672, Taiwan.,Agricultural Biotechnology Research Center, Academia Sinica, Taipei 11529, Taiwan University, Taipei, Taiwan
| | - Chien-Yu Chen
- Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 10617, Taiwan.,Center for Biotechnology, National Taiwan University, Taipei 10672, Taiwan.,Center for Systems Biology, National Taiwan University, Taipei 10672, Taiwan
| | - Eric Y Chuang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 10617, Taiwan.,Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei 10055, Taiwan.,Graduate Institute of Chinese Medical Science, China Medical University, Taichung 40402, Taiwan
| |
Collapse
|
5
|
Sohn JI, Nam JW. The present and future of de novo whole-genome assembly. Brief Bioinform 2018; 19:23-40. [PMID: 27742661 DOI: 10.1093/bib/bbw096] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Indexed: 12/15/2022] Open
Abstract
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome.
Collapse
|
6
|
González-Torres P, Gabaldón T. Genome Variation in the Model Halophilic Bacterium Salinibacter ruber. Front Microbiol 2018; 9:1499. [PMID: 30072959 PMCID: PMC6060240 DOI: 10.3389/fmicb.2018.01499] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 06/18/2018] [Indexed: 01/08/2023] Open
Abstract
The halophilic bacterium Salinibacter ruber is an abundant and ecologically important member of halophilic communities worldwide. Given its broad distribution and high intraspecific genetic diversity, S. ruber is considered one of the main models for ecological and evolutionary studies of bacterial adaptation to hypersaline environments. However, current insights on the genomic diversity of this species is limited to the comparison of the genomes of two co-isolated strains. Here, we present a comparative genomic analysis of eight S. ruber strains isolated at two different time points in each of two different Mediterranean solar salterns. Our results show an open pangenome with contrasting evolutionary patterns in the core and accessory genomes. We found that the core genome is shaped by extensive homologous recombination (HR), which results in limited sequence variation within population clusters. In contrast, the accessory genome is modulated by horizontal gene transfer (HGT), with genomic islands and plasmids acting as gateways to the rest of the genome. In addition, both types of genetic exchange are modulated by restriction and modification (RM) or CRISPR-Cas systems. Finally, genes differentially impacted by such processes reveal functional processes potentially relevant for environmental interactions and adaptation to extremophilic conditions. Altogether, our results support scenarios that conciliate “Neutral” and “Constant Diversity” models of bacterial evolution.
Collapse
Affiliation(s)
- Pedro González-Torres
- Department of Physiology, Genetics and Microbiology, University of Alicante, Alicante, Spain.,Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| |
Collapse
|
7
|
Chen Q, Lan C, Zhao L, Wang J, Chen B, Chen YPP. Recent advances in sequence assembly: principles and applications. Brief Funct Genomics 2018; 16:361-378. [PMID: 28453648 DOI: 10.1093/bfgp/elx006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.
Collapse
|
8
|
Wagner JT, Singh PP, Romney AL, Riggs CL, Minx P, Woll SC, Roush J, Warren WC, Brunet A, Podrabsky JE. The genome of Austrofundulus limnaeus offers insights into extreme vertebrate stress tolerance and embryonic development. BMC Genomics 2018; 19:155. [PMID: 29463212 PMCID: PMC5819677 DOI: 10.1186/s12864-018-4539-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 02/12/2018] [Indexed: 11/21/2022] Open
Abstract
Background The annual killifish Austrofundulus limnaeus inhabits ephemeral ponds in northern Venezuela, South America, and is an emerging extremophile model for vertebrate diapause, stress tolerance, and evolution. Embryos of A. limnaeus regularly experience extended periods of desiccation and anoxia as a part of their natural history and have unique metabolic and developmental adaptations. Currently, there are limited genomic resources available for gene expression and evolutionary studies that can take advantage of A. limnaeus as a unique model system. Results We describe the first draft genome sequence of A. limnaeus. The genome was assembled de novo using a merged assembly strategy and was annotated using the NCBI Eukaryotic Annotation Pipeline. We show that the assembled genome has a high degree of completeness in genic regions that is on par with several other teleost genomes. Using RNA-seq and phylogenetic-based approaches, we identify several candidate genes that may be important for embryonic stress tolerance and post-diapause development in A. limnaeus. Several of these genes include heat shock proteins that have unique expression patterns in A. limnaeus embryos and at least one of these may be under positive selection. Conclusion The A. limnaeus genome is the first South American annual killifish genome made publicly available. This genome will be a valuable resource for comparative genomics to determine the genetic and evolutionary mechanisms that support the unique biology of annual killifishes. In a broader context, this genome will be a valuable tool for exploring genome-environment interactions and their impacts on vertebrate physiology and evolution. Electronic supplementary material The online version of this article (10.1186/s12864-018-4539-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Josiah T Wagner
- Department of Biology, Center for Life in Extreme Environments, Portland State University, Portland, Oregon, USA. .,Knight Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, Oregon, USA.
| | - Param Priya Singh
- Department of Genetics, Stanford University, Stanford, California, USA
| | - Amie L Romney
- Department of Biology, Center for Life in Extreme Environments, Portland State University, Portland, Oregon, USA
| | - Claire L Riggs
- Department of Biology, Center for Life in Extreme Environments, Portland State University, Portland, Oregon, USA
| | - Patrick Minx
- McDonnell Genome Institute at Washington University, St Louis, Missouri, USA
| | - Steven C Woll
- Department of Biology, Center for Life in Extreme Environments, Portland State University, Portland, Oregon, USA
| | - Jake Roush
- Department of Biology, Center for Life in Extreme Environments, Portland State University, Portland, Oregon, USA
| | - Wesley C Warren
- McDonnell Genome Institute at Washington University, St Louis, Missouri, USA
| | - Anne Brunet
- Department of Genetics, Stanford University, Stanford, California, USA.,Glenn Center for the Biology of Aging, Stanford, California, USA
| | - Jason E Podrabsky
- Department of Biology, Center for Life in Extreme Environments, Portland State University, Portland, Oregon, USA
| |
Collapse
|
9
|
Tsai KJ, Lu MYJ, Yang KJ, Li M, Teng Y, Chen S, Ku MSB, Li WH. Assembling the Setaria italica L. Beauv. genome into nine chromosomes and insights into regions affecting growth and drought tolerance. Sci Rep 2016; 6:35076. [PMID: 27734962 PMCID: PMC5062080 DOI: 10.1038/srep35076] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 09/23/2016] [Indexed: 12/23/2022] Open
Abstract
The diploid C4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains.
Collapse
Affiliation(s)
- Kevin J. Tsai
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, 11574 Taiwan
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, 11221 Taiwan
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
| | - Mei-Yeh Jade Lu
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
| | - Kai-Jung Yang
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
| | - Mengyun Li
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
| | - Yuchuan Teng
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
| | - Shihmay Chen
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
| | - Maurice S. B. Ku
- Department of Bioagricultural Science, National Chiayi University, Chiayi, 60004 Taiwan
- School of Biological Sciences, Washington State University, Pullman, WA 99164, USA
| | - Wen-Hsiung Li
- Biodiversity Research Center, Academia Sinica, Taipei, 11574 Taiwan
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637 USA
| |
Collapse
|
10
|
Chawla V, Kumar R, Shankar R. Identifying wrong assemblies in de novo short read primary sequence assembly contigs. J Biosci 2016; 41:455-74. [PMID: 27581937 DOI: 10.1007/s12038-016-9630-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequenced all over the world. Most of these assemblies are done using some de novo short read assemblers and other related approaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuous dearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted or wrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing and assembling have been assessed for their role in causing mis-assembly by using different genome sequencing data. Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembled primary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simple unsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performing reasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that the proposed methodology may work as a complementary system to the existing tools to enhance their accuracy.
Collapse
Affiliation(s)
- Vandna Chawla
- Studio of Computational Biology and Bioinformatics, Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
| | | | | |
Collapse
|
11
|
Bock CH, Chen C, Yu F, Stevenson KL, Wood BW. Draft genome sequence of Fusicladium effusum, cause of pecan scab. Stand Genomic Sci 2016; 11:36. [PMID: 27274782 PMCID: PMC4891892 DOI: 10.1186/s40793-016-0161-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 05/24/2016] [Indexed: 11/10/2022] Open
Abstract
Pecan scab, caused by the plant pathogenic fungus Fusicladium effusum, is the most destructive disease of pecan, an important specialty crop cultivated in several regions of the world. Only a few members of the family Venturiaceae (in which the pathogen resides) have been reported sequenced. We report the first draft genome sequence (40.6 Mb) of an isolate F. effusum collected from a pecan tree (cv. Desirable) in central Georgia, in the US. The genome sequence described will be a useful resource for research of the biology and ecology of the pathogen, coevolution with the pecan host, characterization of genes of interest, and development of markers for studies of genetic diversity, genotyping and phylogenetic analysis. The annotation of the genome is described and a phylogenetic analysis is presented.
Collapse
Affiliation(s)
- Clive H. Bock
- />Southeastern Fruit and Tree Nut Research Lab, USDA, Agricultural Research Service, 21 Dunbar Road, Byron, GA 31008 USA
| | - Chunxian Chen
- />Southeastern Fruit and Tree Nut Research Lab, USDA, Agricultural Research Service, 21 Dunbar Road, Byron, GA 31008 USA
| | - Fahong Yu
- />Interdisciplinary Center for Biotechnology Research, University of Florida, 2033 Mowry Road, Gainesville, FL 32610 USA
| | - Katherine L. Stevenson
- />Department of Plant Pathology, University of Georgia, 2360 Rainwater Rd., Tifton, GA 31793 USA
| | - Bruce W. Wood
- />Southeastern Fruit and Tree Nut Research Lab, USDA, Agricultural Research Service, 21 Dunbar Road, Byron, GA 31008 USA
| |
Collapse
|
12
|
Hou Y, Ma X, Wan W, Long N, Zhang J, Tan Y, Duan S, Zeng Y, Dong Y. Comparative Genomics of Pathogens Causing Brown Spot Disease of Tobacco: Alternaria longipes and Alternaria alternata. PLoS One 2016; 11:e0155258. [PMID: 27159564 PMCID: PMC4861331 DOI: 10.1371/journal.pone.0155258] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 04/26/2016] [Indexed: 12/30/2022] Open
Abstract
The genus Alternaria is a group of infectious/contagious pathogenic fungi that not only invade a wide range of crops but also induce severe allergic reactions in a part of the human population. In this study, two strains Alternaria longipes cx1 and Alternaria alternata cx2 were isolated from different brown spot lesions on infected tobacco leaves. Their complete genomes were sequenced, de novo assembled, and comparatively analyzed. Phylogenetic analysis revealed that A. longipes cx1 and A. alternata cx2 diverged 3.3 million years ago, indicating a recent event of speciation. Seventeen non-ribosomal peptide synthetase (NRPS) genes and 13 polyketide synthase (PKS) genes in A. longipes cx1 and 13 NRPS genes and 12 PKS genes in A. alternata cx2 were identified in these two strains. Some of these genes were predicted to participate in the synthesis of non-host specific toxins (non-HSTs), such as tenuazonic acid (TeA), alternariol (AOH) and alternariol monomethyl ether (AME). By comparative genome analysis, we uncovered that A. longipes cx1 had more genes putatively involved in pathogen-plant interaction, more carbohydrate-degrading enzymes and more secreted proteins than A. alternata cx2. In summary, our results demonstrate the genomic distinction between A. longipes cx1 and A. altenata cx2. They will not only improve the understanding of the phylogenetic relationship among genus Alternaria, but more importantly provide valuable genomic resources for the investigation of plant-pathogen interaction.
Collapse
Affiliation(s)
- Yujie Hou
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Xiao Ma
- Longrun Pu-erh Tea Academy, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Wenting Wan
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Ni Long
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Jing Zhang
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Yuntao Tan
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Shengchang Duan
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Yan Zeng
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Science, Kunming, Yunnan, China
| | - Yang Dong
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, China
- Biological Big Data College, Yunnan Agricultural University, Kunming, Yunnan, China
| |
Collapse
|
13
|
Liu H, Ma X, Yu H, Fang D, Li Y, Wang X, Wang W, Dong Y, Xiao B. Genomes and virulence difference between two physiological races of Phytophthora nicotianae. Gigascience 2016; 5:3. [PMID: 26823972 PMCID: PMC4730604 DOI: 10.1186/s13742-016-0108-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 01/06/2016] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Black shank is a severe plant disease caused by the soil-borne pathogen Phytophthora nicotianae. Two physiological races of P. nicotianae, races 0 and 1, are predominantly observed in cultivated tobacco fields around the world. Race 0 has been reported to be more aggressive, having a shorter incubation period, and causing worse root rot symptoms, while race 1 causes more severe necrosis. The molecular mechanisms underlying the difference in virulence between race 0 and 1 remain elusive. FINDINGS We assembled and annotated the genomes of P. nicotianae races 0 and 1, which were obtained by a combination of PacBio single-molecular real-time sequencing and second-generation sequencing (both HiSeq and MiSeq platforms). Gene family analysis revealed a highly expanded ATP-binding cassette transporter gene family in P. nicotianae. Specifically, more RxLR effector genes were found in the genome of race 0 than in that of race 1. In addition, RxLR effector genes were found to be mainly distributed in gene-sparse, repeat-rich regions of the P. nicotianae genome. CONCLUSIONS These results provide not only high quality reference genomes of P. nicotianae, but also insights into the infection mechanisms of P. nicotianae and its co-evolution with the host plant. They also reveal insights into the difference in virulence between the two physiological races.
Collapse
Affiliation(s)
- Hui Liu
- />CAS-Max Planck Junior Research Group, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China
- />University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Xiao Ma
- />Yunnan Agricultural University, Kunming, 650100 China
| | - Haiqin Yu
- />Yunnan Academy of Tobacco Agricultural Sciences, Yuantong Street No.33, Kunming, Yunnan 650021 China
| | - Dunhuang Fang
- />Yunnan Academy of Tobacco Agricultural Sciences, Yuantong Street No.33, Kunming, Yunnan 650021 China
| | - Yongping Li
- />Yunnan Academy of Tobacco Agricultural Sciences, Yuantong Street No.33, Kunming, Yunnan 650021 China
| | - Xiao Wang
- />CAS-Max Planck Junior Research Group, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China
- />University of Chinese Academy of Sciences, Beijing, 100049 China
| | - Wen Wang
- />CAS-Max Planck Junior Research Group, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223 China
| | - Yang Dong
- />Yunnan Agricultural University, Kunming, 650100 China
- />Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650500 China
| | - Bingguang Xiao
- />Yunnan Academy of Tobacco Agricultural Sciences, Yuantong Street No.33, Kunming, Yunnan 650021 China
| |
Collapse
|
14
|
Cunha MLR, Meijers JCM, Middeldorp S. Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism. Thromb Haemost 2015; 114:920-32. [PMID: 26446408 DOI: 10.1160/th15-05-0411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/26/2015] [Indexed: 12/13/2022]
Abstract
Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.
Collapse
Affiliation(s)
- Marisa L R Cunha
- Marisa L. R. Cunha, Department of Experimental Vascular Medicine, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel.: +31 20 5662824, Fax: +31 20 6968833, E-mail:
| | | | | |
Collapse
|
15
|
Möbius P, Hölzer M, Felder M, Nordsiek G, Groth M, Köhler H, Reichwald K, Platzer M, Marz M. Comprehensive insights in the Mycobacterium avium subsp. paratuberculosis genome using new WGS data of sheep strain JIII-386 from Germany. Genome Biol Evol 2015; 7:2585-2601. [PMID: 26384038 PMCID: PMC4607514 DOI: 10.1093/gbe/evv154] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Mycobacterium avium (M. a.) subsp. paratuberculosis (MAP)—the etiologic agent of Johne’s disease—affects cattle, sheep, and other ruminants worldwide. To decipher phenotypic differences among sheep and cattle strains (belonging to MAP-S [Type-I/III], respectively, MAP-C [Type-II]), comparative genome analysis needs data from diverse isolates originating from different geographic regions of the world. This study presents the so far best assembled genome of a MAP-S-strain: Sheep isolate JIII-386 from Germany. One newly sequenced cattle isolate (JII-1961, Germany), four published MAP strains of MAP-C and MAP-S from the United States and Australia, and M. a. subsp. hominissuis (MAH) strain 104 were used for assembly improvement and comparisons. All genomes were annotated by BacProt and results compared with NCBI (National Center for Biotechnology Information) annotation. Corresponding protein-coding sequences (CDSs) were detected, but also CDSs that were exclusively determined by either NCBI or BacProt. A new Shine–Dalgarno sequence motif (5′-AGCTGG-3′) was extracted. Novel CDSs including PE-PGRS family protein genes and about 80 noncoding RNAs exhibiting high sequence conservation are presented. Previously found genetic differences between MAP-types are partially revised. Four of ten assumed MAP-S-specific large sequence polymorphism regions (LSPSs) are still present in MAP-C strains; new LSPSs were identified. Independently of the regional origin of the strains, the number of individual CDSs and single nucleotide variants confirms the strong similarity of MAP-C strains and shows higher diversity among MAP-S strains. This study gives ambiguous results regarding the hypothesis that MAP-S is the evolutionary intermediate between MAH and MAP-C, but it clearly shows a higher similarity of MAP to MAH than to Mycobacterium intracellulare.
Collapse
Affiliation(s)
- Petra Möbius
- NRL for Paratuberculosis, Institute of Molecular Pathogenesis, Friedrich-Loeffler-Institut (Federal Research Institute for Animal Health), Naumburger Straße 96a, 07743 Jena, Germany
| | - Martin Hölzer
- RNA Bioinformatics and High Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
| | - Marius Felder
- Leibniz Institute for Age Research - Fritz-Lipmann-Institute (FLI), Beutenbergstraße 11, 07745 Jena, Germany
| | - Gabriele Nordsiek
- Department of Genome Analysis, Helmholtz Centre for Infection Research, Inhoffenstr. 7, 38124 Braunschweig, Germany
| | - Marco Groth
- Leibniz Institute for Age Research - Fritz-Lipmann-Institute (FLI), Beutenbergstraße 11, 07745 Jena, Germany
| | - Heike Köhler
- NRL for Paratuberculosis, Institute of Molecular Pathogenesis, Friedrich-Loeffler-Institut (Federal Research Institute for Animal Health), Naumburger Straße 96a, 07743 Jena, Germany
| | - Kathrin Reichwald
- Leibniz Institute for Age Research - Fritz-Lipmann-Institute (FLI), Beutenbergstraße 11, 07745 Jena, Germany
| | - Matthias Platzer
- Leibniz Institute for Age Research - Fritz-Lipmann-Institute (FLI), Beutenbergstraße 11, 07745 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High Throughput Analysis, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
16
|
Remarkably Divergent Regions Punctuate the Genome Assembly of the Caenorhabditis elegans Hawaiian Strain CB4856. Genetics 2015; 200:975-89. [PMID: 25995208 PMCID: PMC4512556 DOI: 10.1534/genetics.115.175950] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 04/29/2015] [Indexed: 01/24/2023] Open
Abstract
The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion–deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes.
Collapse
|
17
|
Campana MG, Robles García NM, Tuross N. America's red gold: multiple lineages of cultivated cochineal in Mexico. Ecol Evol 2015; 5:607-17. [PMID: 25691985 PMCID: PMC4328766 DOI: 10.1002/ece3.1398] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Revised: 12/15/2014] [Accepted: 12/18/2014] [Indexed: 01/31/2023] Open
Abstract
Cultivated cochineal (Dactylopius coccus) produces carminic acid, a valuable red dye used to color textiles, cosmetics, and food. Extant native D. coccus is largely restricted to two populations in the Mexican and the Andean highlands, although the insect's ultimate center of domestication remains unclear. Moreover, due to Mexican D. coccus cultivation's near demise during the 19th century, the genetic diversity of current cochineal stock is unknown. Through genomic sequencing, we identified two divergent D. coccus populations in highland Mexico: one unique to Mexico and another that was more closely related to extant Andean cochineal. Relic diversity is preserved in the crops of small-scale Mexican cochineal farmers. Conversely, larger-scale commercial producers are cultivating the Andean-like cochineal, which may reflect clandestine 20th century importation.
Collapse
Affiliation(s)
- Michael G Campana
- Department of Human Evolutionary Biology, Harvard University 11 Divinity Avenue, Cambridge, Massachusetts, 02138
| | - Nelly M Robles García
- Proyecto Conjunto Monumental de Atzompa Calle Reforma 501, esq. Constitución. Sala IV. Centro Histórico, Oaxaca, Oaxaca, 68000, Mexico
| | - Noreen Tuross
- Department of Human Evolutionary Biology, Harvard University 11 Divinity Avenue, Cambridge, Massachusetts, 02138
| |
Collapse
|
18
|
Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 2014; 7:1026-42. [PMID: 25553065 PMCID: PMC4231593 DOI: 10.1111/eva.12178] [Citation(s) in RCA: 188] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/20/2014] [Indexed: 12/12/2022] Open
Abstract
Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects.
Collapse
Affiliation(s)
- Robert Ekblom
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| | - Jochen B W Wolf
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| |
Collapse
|
19
|
Liang P, Zhang Y, Lin K, Hu J. A fast sequence assembly method based on compressed data structures. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2014; 2014:326-329. [PMID: 25569963 DOI: 10.1109/embc.2014.6943595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, a memory and time efficient assembler is presented from applying FM-index in JR-Assembler, called FMJ-Assembler, where FM stand for FMR-index derived from the FM-index and BWT and J for jumping extension. The FMJ-Assembler uses expanded FM-index and BWT to compress data of reads to save memory and jumping extension method make it faster in CPU time. An extensive comparison of the FMJ-Assembler with current assemblers shows that the FMJ-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less CPU time. All these advantages of the FMJ-Assembler indicate that the FMJ-Assembler will be an efficient assembly method in next generation sequencing technology.
Collapse
|