1
|
Lin A, Torres CM, Hobbs EC, Bardhan J, Aley SB, Spencer CT, Taylor KL, Chiang T. Computational and Systems Biology Advances to Enable Bioagent Agnostic Signatures. Health Secur 2024; 22:130-139. [PMID: 38483337 PMCID: PMC11044874 DOI: 10.1089/hs.2023.0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024] Open
Affiliation(s)
- Andy Lin
- Andy Lin, PhD, is a Linus Pauling Distinguished Postdoctoral Fellow; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Cameron M. Torres
- Cameron M. Torres is a Graduate Research Assistant and Wieland Fellow, Department of Biological Sciences; at the University of Texas at El Paso, El Paso, TX
| | - Errett C. Hobbs
- Errett C. Hobbs, PhD, is a Data Scientist; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Jaydeep Bardhan
- Jaydeep Bardhan, PhD, is a Research Line Manager, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA
| | - Stephen B. Aley
- Stephen B. Aley, PhD, is a Professor, Biological Sciences, and an Associate Vice President for Research, Sponsored Projects; at the University of Texas at El Paso, El Paso, TX
| | - Charles T. Spencer
- Charles T. Spencer, PhD, is an Associate Professor, Biological Sciences, and Edward and Barbara Brown Egbert Endowed Chair of the Department of Biological Sciences; at the University of Texas at El Paso, El Paso, TX
| | - Karen L. Taylor
- Karen L. Taylor, MS, is a Research Line Manager; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Tony Chiang
- Tony Chiang, PhD, is a Data Scientist; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| |
Collapse
|
2
|
Varsamis GD, Karafyllidis IG, Gilkes KM, Arranz U, Martin-Cuevas R, Calleja G, Wong J, Jessen HC, Dimitrakis P, Kolovos P, Sandaltzopoulos R. Quantum algorithm for de novo DNA sequence assembly based on quantum walks on graphs. Biosystems 2023; 233:105037. [PMID: 37734700 DOI: 10.1016/j.biosystems.2023.105037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/16/2023] [Accepted: 09/18/2023] [Indexed: 09/23/2023]
Abstract
De novo DNA sequence assembly is based on finding paths in overlap graphs, which is a NP-hard problem. We developed a quantum algorithm for de novo assembly based on quantum walks in graphs. The overlap graph is partitioned repeatedly to smaller graphs that form a hierarchical structure. We use quantum walks to find paths in low rank graphs and a quantum algorithm that finds Hamiltonian paths in high hierarchical rank. We tested the partitioning quantum algorithm, as well as the quantum algorithm that finds Hamiltonian paths in high hierarchical rank and confirmed its correct operation using Qiskit. We developed a custom simulation for quantum walks to search for paths in low rank graphs. The approach described in this paper may serve as a basis for the development of efficient quantum algorithms that solve the de novo DNA assembly problem.
Collapse
Affiliation(s)
- G D Varsamis
- Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, 67100, Greece
| | - I G Karafyllidis
- Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, 67100, Greece; National Centre for Scientific Research Demokritos, Athens, 15342, Greece.
| | - K M Gilkes
- EY Global Innovation Quantum Computing Lab, USA
| | - U Arranz
- EY Global Innovation Quantum Computing Lab, Spain
| | | | - G Calleja
- EY Global Innovation Quantum Computing Lab, Spain
| | - J Wong
- EY Global Innovation Quantum Computing Lab, USA
| | - H C Jessen
- EY Global Innovation Quantum Computing Lab, Denmark
| | - P Dimitrakis
- National Centre for Scientific Research Demokritos, Athens, 15342, Greece
| | - P Kolovos
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, 68100, Greece
| | - R Sandaltzopoulos
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, 68100, Greece
| |
Collapse
|
3
|
Espinosa E, Bautista R, Fernandez I, Larrosa R, Zapata EL, Plata O. Comparing assembly strategies for third-generation sequencing technologies across different genomes. Genomics 2023; 115:110700. [PMID: 37598732 DOI: 10.1016/j.ygeno.2023.110700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 08/07/2023] [Accepted: 08/16/2023] [Indexed: 08/22/2023]
Abstract
The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Ivan Fernandez
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, C. Jordi Girona, 1-3, Barcelona 08034, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Emilio L Zapata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
4
|
Chanama M, Prombutara P, Chanama S. Comparative genome features and secondary metabolite biosynthetic potential of Kutzneria chonburiensis and other species of the genus Kutzneria. Sci Rep 2023; 13:8794. [PMID: 37258607 DOI: 10.1038/s41598-023-36039-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 05/28/2023] [Indexed: 06/02/2023] Open
Abstract
Actinobacteria are well known as a rich source of diversity of bioactive secondary metabolites. Kutzneria, a rare actinobacteria belonging to the family Pseudonocardiaceae has abundance of secondary metabolite biosynthetic gene clusters (BGCs) and is one of important source of natural products and worthy of priority investigation. Currently, Kutzneria chonburiensis SMC256T has been the latest type-strain of the genus and its genome sequence has not been reported yet. Therefore, we present the first report of new complete genome sequence of SMC256T (genome size of 10.4 Mbp) with genome annotation and feature comparison between SMC256T and other publicly available Kutzneria species. The results from comparative and functional genomic analyses regarding the phylogenomic and the clusters of orthologous groups of proteins (COGs) analyses indicated that SMC256T is most closely related to Kutzneria sp. 744, Kutzneria kofuensis, Kutzneria sp. CA-103260 and Kutzneria buriramensis. Furthermore, a total of 322 BGCs were also detected and showed diversity among the Kutzneria genomes. Out of which, 38 clusters showing the best hit to the most known BGCs were predicted in the SMC256Tgenome. We observed that six clusters responsible for biosynthesis of antimicrobials/antitumor metabolites were strain-specific in Kutzneria chonburiensis. These putative metabolites include virginiamycin S1, lysolipin I, esmeraldin, rakicidin, aclacinomycin and streptoseomycin. Based on these findings, the genome of Kutzneria chonburiensis contains distinct and unidentified BGCs different from other members of the genus, and the use of integrative genomic-based approach would be a useful alternative effort to target, isolate and identify putative and undiscovered secondary metabolites suspected to have new and/or specific bioactivity in the Kutzneria.
Collapse
Affiliation(s)
- Manee Chanama
- Department of Microbiology, Faculty of Public Health, Mahidol University, Bangkok, 10400, Thailand.
| | - Pinidphon Prombutara
- Omics Sciences and Bioinformatics Center, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Suchart Chanama
- Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
| |
Collapse
|
5
|
Wang C, Wu DD, Yuan YH, Yao MC, Han JL, Wu YJ, Shan F, Li WP, Zhai JQ, Huang M, Peng SM, Cai QH, Yu JY, Liu QX, Liu ZY, Li LX, Teng MS, Huang W, Zhou JY, Zhang C, Chen W, Tu XL. Population genomic analysis provides evidence of the past success and future potential of South China tiger captive conservation. BMC Biol 2023; 21:64. [PMID: 37069598 PMCID: PMC10111772 DOI: 10.1186/s12915-023-01552-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 02/21/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Among six extant tiger subspecies, the South China tiger (Panthera tigris amoyensis) once was widely distributed but is now the rarest one and extinct in the wild. All living South China tigers are descendants of only two male and four female wild-caught tigers and they survive solely in zoos after 60 years of effective conservation efforts. Inbreeding depression and hybridization with other tiger subspecies were believed to have occurred within the small, captive South China tiger population. It is therefore urgently needed to examine the genomic landscape of existing genetic variation among the South China tigers. RESULTS In this study, we assembled a high-quality chromosome-level genome using long-read sequences and re-sequenced 29 high-depth genomes of the South China tigers. By combining and comparing our data with the other 40 genomes of six tiger subspecies, we identified two significantly differentiated genomic lineages among the South China tigers, which harbored some rare genetic variants introgressed from other tiger subspecies and thus maintained a moderate genetic diversity. We noticed that the South China tiger had higher FROH values for longer runs of homozygosity (ROH > 1 Mb), an indication of recent inbreeding/founder events. We also observed that the South China tiger had the least frequent homozygous genotypes of both high- and moderate-impact deleterious mutations, and lower mutation loads than both Amur and Sumatran tigers. Altogether, our analyses indicated an effective genetic purging of deleterious mutations in homozygous states from the South China tiger, following its population contraction with a controlled increase in inbreeding based on its pedigree records. CONCLUSIONS The identification of two unique founder/genomic lineages coupled with active genetic purging of deleterious mutations in homozygous states and the genomic resources generated in our study pave the way for a genomics-informed conservation, following the real-time monitoring and rational exchange of reproductive South China tigers among zoos.
Collapse
Affiliation(s)
- Chen Wang
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, Yunnan, China
- Kunming College of Life Science, University of the Chinese Academy of Sciences, Kunming, 650204, China
| | | | - Meng-Cheng Yao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, Yunnan, China
- Kunming College of Life Science, University of the Chinese Academy of Sciences, Kunming, 650204, China
| | - Jian-Lin Han
- CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
- International Livestock Research Institute (ILRI), Nairobi, 00100, Kenya
| | - Ya-Jiang Wu
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Fen Shan
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Wan-Ping Li
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Jun-Qiong Zhai
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Mian Huang
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Shi-Ming Peng
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | - Qin-Hui Cai
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China
| | | | | | | | - Lin-Xiang Li
- Suzhou Shangfangshan Forest Zoo, Suzhou, 215009, China
| | | | - Wei Huang
- Nanchang Zoo, Nanchang, 330025, China
| | - Jun-Ying Zhou
- Chinese Association of Zoological Gardens, Beijing, 100037, China
| | - Chi Zhang
- Qinghai Province Key Laboratory of Crop Molecular Breeding, Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810008, Qinghai, China
| | - Wu Chen
- Guangzhou Zoo & Guangzhou Wildlife Research Center, Guangzhou, 510070, China.
| | - Xiao-Long Tu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China.
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, Yunnan, China.
- Kunming College of Life Science, University of the Chinese Academy of Sciences, Kunming, 650204, China.
| |
Collapse
|
6
|
Nykrynova M, Barton V, Bezdicek M, Lengerova M, Skutkova H. Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping. BMC Genomics 2022; 23:445. [PMID: 36581824 PMCID: PMC9798552 DOI: 10.1186/s12864-022-08550-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 04/14/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However, these methods are costly, time-consuming and have high computational demands. An alternative to these methods is mini-MLST, a quick, cost-effective and robust method based on high-resolution melting analysis. Nevertheless, no standardized approach to identify markers suitable for mini-MLST exists. Here, we present a pipeline for variable fragment detection in unmapped reads based on a modified hybrid assembly approach using data from one sequencing platform. RESULTS In routine assembly against the reference sequence, high variable reads are not aligned and remain unmapped. If de novo assembly of them is performed, variable genomic regions can be located in created scaffolds. Based on the variability rates calculation, it is possible to find a highly variable region with the same discriminatory power as seven housekeeping gene fragments used in MLST. In the work presented here, we show the capability of identifying one variable fragment in de novo assembled scaffolds of 21 Escherichia coli genomes and three variable regions in scaffolds of 31 Klebsiella pneumoniae genomes. For each identified fragment, the melting temperatures are calculated based on the nearest neighbor method to verify the mini-MLST's discriminatory power. CONCLUSIONS A pipeline for a modified hybrid assembly approach consisting of reference-based mapping and de novo assembly of unmapped reads is presented. This approach can be employed for the identification of highly variable genomic fragments in unmapped reads. The identified variable regions can then be used in efficient laboratory methods for bacterial typing such as mini-MLST with high discriminatory power, fully replacing expensive methods such as MLST. The results can and will be delivered in a shorter time, which allows immediate and fast infection monitoring in clinical practice.
Collapse
Affiliation(s)
- Marketa Nykrynova
- grid.4994.00000 0001 0118 0988Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czechia
| | - Vojtech Barton
- grid.4994.00000 0001 0118 0988Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czechia
| | - Matej Bezdicek
- grid.412554.30000 0004 0609 2751Department of Internal Medicine, Hematology and Oncology, University Hospital Brno, Brno, Czechia
| | - Martina Lengerova
- grid.412554.30000 0004 0609 2751Department of Internal Medicine, Hematology and Oncology, University Hospital Brno, Brno, Czechia
| | - Helena Skutkova
- grid.4994.00000 0001 0118 0988Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czechia
| |
Collapse
|
7
|
ONT-Based Alternative Assemblies Impact on the Annotations of Unique versus Repetitive Features in the Genome of a Romanian Strain of Drosophila melanogaster. Int J Mol Sci 2022; 23:ijms232314892. [PMID: 36499217 PMCID: PMC9741293 DOI: 10.3390/ijms232314892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/21/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open
Abstract
To date, different strategies of whole-genome sequencing (WGS) have been developed in order to understand the genome structure and functions. However, the analysis of genomic sequences obtained from natural populations is challenging and the biological interpretation of sequencing data remains the main issue. The MinION device developed by Oxford Nanopore Technologies (ONT) is able to generate long reads with minimal costs and time requirements. These valuable assets qualify it as a suitable method for performing WGS, especially in small laboratories. The long reads resulted using this sequencing approach can cover large structural variants and repetitive sequences commonly present in the genomes of eukaryotes. Using MinION, we performed two WGS assessments of a Romanian local strain of Drosophila melanogaster, referred to as Horezu_LaPeri (Horezu). In total, 1,317,857 reads with a size of 8.9 gigabytes (Gb) were generated. Canu and Flye de novo assembly tools were employed to obtain four distinct assemblies with both unfiltered and filtered reads, achieving maximum reference genome coverages of 94.8% (Canu) and 91.4% (Flye). In order to test the quality of these assemblies, we performed a two-step evaluation. Firstly, we considered the BUSCO scores and inquired for a supplemental set of genes using BLAST. Subsequently, we appraised the total content of natural transposons (NTs) relative to the reference genome (ISO1 strain) and mapped the mdg1 retroelement as a resolution assayer. Our results reveal that filtered data provide only slightly enhanced results when considering genes identification, but the use of unfiltered data had a consistent positive impact on the global evaluation of the NTs content. Our comparative studies also revealed differences between Flye and Canu assemblies regarding the annotation of unique versus repetitive genomic features. In our hands, Flye proved to be moderately better for gene identification, while Canu clearly outperformed Flye for NTs analysis. Data concerning the NTs content were compared to those obtained with ONT for the D. melanogaster ISO1 strain, revealing that our strategy conducted to better results. Additionally, the parameters of our ONT reads and assemblies are similar to those reported for ONT experiments performed on various model organisms, revealing that our assembly data are appropriate for a proficient annotation of the Horezu genome.
Collapse
|
8
|
Goussarov G, Mysara M, Vandamme P, Van Houdt R. Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data. Microbiologyopen 2022; 11:e1298. [PMID: 35765182 PMCID: PMC9179125 DOI: 10.1002/mbo3.1298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/18/2022] Open
Abstract
The rise of metagenomics offers a leap forward for understanding the genetic diversity of microorganisms in many different complex environments by providing a platform that can identify potentially unlimited numbers of known and novel microorganisms. As such, it is impossible to imagine new major initiatives without metagenomics. Nevertheless, it represents a relatively new discipline with various levels of complexity and demands on bioinformatics. The underlying principles and methods used in metagenomics are often seen as common knowledge and often not detailed or fragmented. Therefore, we reviewed these to guide microbiologists in taking the first steps into metagenomics. We specifically focus on a workflow aimed at reconstructing individual genomes, that is, metagenome-assembled genomes, integrating DNA sequencing, assembly, binning, identification and annotation.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| |
Collapse
|
9
|
Gaulke CA, Schmeltzer ER, Dasenko M, Tyler BM, Vega Thurber R, Sharpton TJ. Evaluation of the Effects of Library Preparation Procedure and Sample Characteristics on the Accuracy of Metagenomic Profiles. mSystems 2021; 6:e0044021. [PMID: 34636674 PMCID: PMC8510527 DOI: 10.1128/msystems.00440-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 09/18/2021] [Indexed: 11/20/2022] Open
Abstract
Shotgun metagenomic sequencing has transformed our understanding of microbial community ecology. However, preparing metagenomic libraries for high-throughput DNA sequencing remains a costly, labor-intensive, and time-consuming procedure, which in turn limits the utility of metagenomes. Several library preparation procedures have recently been developed to offset these costs, but it is unclear how these newer procedures compare to current standards in the field. In particular, it is not clear if all such procedures perform equally well across different types of microbial communities or if features of the biological samples being processed (e.g., DNA amount) impact the accuracy of the approach. To address these questions, we assessed how five different shotgun DNA sequence library preparation methods, including the commonly used Nextera Flex kit, perform when applied to metagenomic DNA. We measured each method's ability to produce metagenomic data that accurately represent the underlying taxonomic and genetic diversity of the community. We performed these analyses across a range of microbial community types (e.g., soil, coral associated, and mouse gut associated) and input DNA amounts. We find that the type of community and amount of input DNA influence each method's performance, indicating that careful consideration may be needed when selecting between methods, especially for low-complexity communities. However, the cost-effective preparation methods that we assessed are generally comparable to the current gold-standard Nextera DNA Flex kit for high-complexity communities. Overall, the results from this analysis will help expand and even facilitate access to metagenomic approaches in future studies. IMPORTANCE Metagenomic library preparation methods and sequencing technologies continue to advance rapidly, allowing researchers to characterize microbial communities in previously underexplored environmental samples and systems. However, widely accepted standardized library preparation methods can be cost-prohibitive. Newly available approaches may be less expensive, but their efficacy in comparison to standardized methods remains unknown. In this study, we compared five different metagenomic library preparation methods. We evaluated each method across a range of microbial communities varying in complexity and quantity of input DNA. Our findings demonstrate the importance of considering sample properties, including community type, composition, and DNA amount, when choosing the most appropriate metagenomic library preparation method.
Collapse
Affiliation(s)
- Christopher A. Gaulke
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Department of Pathobiology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | | | - Mark Dasenko
- Center for Quantitative Life Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Brett M. Tyler
- Center for Quantitative Life Sciences, Oregon State University, Corvallis, Oregon, USA
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | | | - Thomas J. Sharpton
- Department of Microbiology, Oregon State University, Corvallis, Oregon, USA
- Center for Quantitative Life Sciences, Oregon State University, Corvallis, Oregon, USA
- Department of Statistics, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
10
|
Kutnjak D, Tamisier L, Adams I, Boonham N, Candresse T, Chiumenti M, De Jonghe K, Kreuze JF, Lefebvre M, Silva G, Malapi-Wight M, Margaria P, Mavrič Pleško I, McGreig S, Miozzi L, Remenant B, Reynard JS, Rollin J, Rott M, Schumpp O, Massart S, Haegeman A. A Primer on the Analysis of High-Throughput Sequencing Data for Detection of Plant Viruses. Microorganisms 2021; 9:841. [PMID: 33920047 PMCID: PMC8071028 DOI: 10.3390/microorganisms9040841] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 04/09/2021] [Accepted: 04/10/2021] [Indexed: 12/12/2022] Open
Abstract
High-throughput sequencing (HTS) technologies have become indispensable tools assisting plant virus diagnostics and research thanks to their ability to detect any plant virus in a sample without prior knowledge. As HTS technologies are heavily relying on bioinformatics analysis of the huge amount of generated sequences, it is of utmost importance that researchers can rely on efficient and reliable bioinformatic tools and can understand the principles, advantages, and disadvantages of the tools used. Here, we present a critical overview of the steps involved in HTS as employed for plant virus detection and virome characterization. We start from sample preparation and nucleic acid extraction as appropriate to the chosen HTS strategy, which is followed by basic data analysis requirements, an extensive overview of the in-depth data processing options, and taxonomic classification of viral sequences detected. By presenting the bioinformatic tools and a detailed overview of the consecutive steps that can be used to implement a well-structured HTS data analysis in an easy and accessible way, this paper is targeted at both beginners and expert scientists engaging in HTS plant virome projects.
Collapse
Affiliation(s)
- Denis Kutnjak
- Department of Biotechnology and Systems Biology, National Institute of Biology, Večna pot 111, 1000 Ljubljana, Slovenia
| | - Lucie Tamisier
- Plant Pathology Laboratory, Université de Liège, Gembloux Agro-Bio Tech, TERRA, Passage des Déportés, 2, 5030 Gembloux, Belgium; (L.T.); (J.R.); (S.M.)
| | - Ian Adams
- Fera Science Limited, York YO41 1LZ, UK; (I.A.); (S.M.)
| | - Neil Boonham
- Institute for Agri-Food Research and Innovation, Newcastle University, King’s Rd, Newcastle Upon Tyne NE1 7RU, UK;
| | - Thierry Candresse
- UMR 1332 Biologie du Fruit et Pathologie, INRA, University of Bordeaux, 33140 Villenave d’Ornon, France; (T.C.); (M.L.)
| | - Michela Chiumenti
- Institute for Sustainable Plant Protection, National Research Council, Via Amendola, 122/D, 70126 Bari, Italy;
| | - Kris De Jonghe
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food, Burg. Van Gansberghelaan 96, 9820 Merelbeke, Belgium; (K.D.J.); (A.H.)
| | - Jan F. Kreuze
- International Potato Center (CIP), Avenida la Molina 1895, La Molina, Lima 15023, Peru;
| | - Marie Lefebvre
- UMR 1332 Biologie du Fruit et Pathologie, INRA, University of Bordeaux, 33140 Villenave d’Ornon, France; (T.C.); (M.L.)
| | - Gonçalo Silva
- Natural Resources Institute, University of Greenwich, Central Avenue, Chatham Maritime, Kent ME4 4TB, UK;
| | - Martha Malapi-Wight
- Biotechnology Risk Analysis Programs, Biotechnology Regulatory Services, Animal and Plant Health Inspection Service, U.S. Department of Agriculture, Riverdale, MD 20737, USA;
| | - Paolo Margaria
- Leibniz Institute-DSMZ, Inhoffenstrasse 7b, 38124 Braunschweig, Germany;
| | - Irena Mavrič Pleško
- Agricultural Institute of Slovenia, Hacquetova Ulica 17, 1000 Ljubljana, Slovenia;
| | - Sam McGreig
- Fera Science Limited, York YO41 1LZ, UK; (I.A.); (S.M.)
| | - Laura Miozzi
- Institute for Sustainable Plant Protection, National Research Council of Italy (IPSP-CNR), Strada delle Cacce 73, 10135 Torino, Italy;
| | - Benoit Remenant
- ANSES Plant Health Laboratory, 7 Rue Jean Dixméras, CEDEX 01, 49044 Angers, France;
| | | | - Johan Rollin
- Plant Pathology Laboratory, Université de Liège, Gembloux Agro-Bio Tech, TERRA, Passage des Déportés, 2, 5030 Gembloux, Belgium; (L.T.); (J.R.); (S.M.)
- DNAVision, 6041 Charleroi, Belgium
| | - Mike Rott
- Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Rd, North Saanich, BC V8L 1H3, Canada;
| | - Olivier Schumpp
- Agroscope, Route de Duillier 50, 1260 Nyon, Switzerland; (J.-S.R.); (O.S.)
| | - Sébastien Massart
- Plant Pathology Laboratory, Université de Liège, Gembloux Agro-Bio Tech, TERRA, Passage des Déportés, 2, 5030 Gembloux, Belgium; (L.T.); (J.R.); (S.M.)
| | - Annelies Haegeman
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food, Burg. Van Gansberghelaan 96, 9820 Merelbeke, Belgium; (K.D.J.); (A.H.)
| |
Collapse
|
11
|
Analysis of Gene Expression Changes in Plants Grown in Salty Soil in Response to Inoculation with Halophilic Bacteria. Int J Mol Sci 2021; 22:ijms22073611. [PMID: 33807153 PMCID: PMC8036567 DOI: 10.3390/ijms22073611] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 03/25/2021] [Accepted: 03/27/2021] [Indexed: 12/24/2022] Open
Abstract
Soil salinity is an increasing problem facing agriculture in many parts of the world. Climate change and irrigation practices have led to decreased yields of some farmland due to increased salt levels in the soil. Plants that have tolerance to salt are thus needed to feed the world's population. One approach addressing this problem is genetic engineering to introduce genes encoding salinity, but this approach has limitations. Another fairly new approach is the isolation and development of salt-tolerant (halophilic) plant-associated bacteria. These bacteria are used as inoculants to stimulate plant growth. Several reports are now available, demonstrating how the use of halophilic inoculants enhance plant growth in salty soil. However, the mechanisms for this growth stimulation are as yet not clear. Enhanced growth in response to bacterial inoculation is expected to be associated with changes in plant gene expression. In this review, we discuss the current literature and approaches for analyzing altered plant gene expression in response to inoculation with halophilic bacteria. Additionally, challenges and limitations to current approaches are analyzed. A further understanding of the molecular mechanisms involved in enhanced plant growth when inoculated with salt-tolerant bacteria will significantly improve agriculture in areas affected by saline soils.
Collapse
|
12
|
Perkins V, Vignola S, Lessard MH, Plante PL, Corbeil J, Dugat-Bony E, Frenette M, Labrie S. Phenotypic and Genetic Characterization of the Cheese Ripening Yeast Geotrichum candidum. Front Microbiol 2020; 11:737. [PMID: 32457706 PMCID: PMC7220993 DOI: 10.3389/fmicb.2020.00737] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 03/30/2020] [Indexed: 01/04/2023] Open
Abstract
The yeast Geotrichum candidum (teleomorph Galactomyces candidus) is inoculated onto mold- and smear-ripened cheeses and plays several roles during cheese ripening. Its ability to metabolize proteins, lipids, and organic acids enables its growth on the cheese surface and promotes the development of organoleptic properties. Recent multilocus sequence typing (MLST) and phylogenetic analyses of G. candidum isolates revealed substantial genetic diversity, which may explain its strain-dependant technological capabilities. Here, we aimed to shed light on the phenotypic and genetic diversity among eight G. candidum and three Galactomyces spp. strains of environmental and dairy origin. Phenotypic tests such as carbon assimilation profiles, the ability to grow at 35°C and morphological traits on agar plates allowed us to discriminate G. candidum from Galactomyces spp. The genomes of these isolates were sequenced and assembled; whole genome comparison clustered the G. candidum strains into three subgroups and provided a reliable reference for MLST scheme optimization. Using the whole genome sequence as a reference, we optimized an MLST scheme using six loci that were proposed in two previous MLST schemes. This new MLST scheme allowed us to identify 15 sequence types (STs) out of 41 strains and revealed three major complexes named GeoA, GeoB, and GeoC. The population structure of these 41 strains was evaluated with STRUCTURE and a NeighborNet analysis of the combined six loci, which revealed recombination events between and within the complexes. These results hint that the allele variation conferring the different STs arose from recombination events. Recombination occurred for the six housekeeping genes studied, but most likely occurred throughout the genome. These recombination events may have induced an adaptive divergence between the wild strains and the cheesemaking strains, as observed for other cheese ripening fungi. Further comparative genomic studies are needed to confirm this phenomenon in G. candidum. In conclusion, the draft assembly of 11 G. candidum/Galactomyces spp. genomes allowed us to optimize a genotyping MLST scheme and, combined with the assessment of their ability to grow under different conditions, provides a reliable tool to cluster and eventually improves the selection of G. candidum strains.
Collapse
Affiliation(s)
- Vincent Perkins
- Department of Food Sciences and Nutrition, STELA Dairy Research Center, Institute of Nutrition and Functional Foods, Université Laval, Quebec City, QC, Canada
| | - Stéphanie Vignola
- Department of Food Sciences and Nutrition, STELA Dairy Research Center, Institute of Nutrition and Functional Foods, Université Laval, Quebec City, QC, Canada
| | - Marie-Hélène Lessard
- Department of Food Sciences and Nutrition, STELA Dairy Research Center, Institute of Nutrition and Functional Foods, Université Laval, Quebec City, QC, Canada
| | - Pier-Luc Plante
- Big Data Research Center, Université Laval, Quebec City, QC, Canada
| | - Jacques Corbeil
- Big Data Research Center, Université Laval, Quebec City, QC, Canada
| | - Eric Dugat-Bony
- Department of Food Sciences and Nutrition, STELA Dairy Research Center, Institute of Nutrition and Functional Foods, Université Laval, Quebec City, QC, Canada
- Université Paris-Saclay, INRAE, AgroParisTech, UMR SayFood, Thiverval-Grignon, France
| | - Michel Frenette
- Oral Ecology Research Group, Faculty of Dental Medicine, Université Laval, Quebec City, QC, Canada
- Faculty of Science and Engineering, Department of Biochemistry, Microbiology, and Bioinformatics, Université Laval, Quebec City, QC, Canada
| | - Steve Labrie
- Department of Food Sciences and Nutrition, STELA Dairy Research Center, Institute of Nutrition and Functional Foods, Université Laval, Quebec City, QC, Canada
| |
Collapse
|
13
|
Sato MP, Ogura Y, Nakamura K, Nishida R, Gotoh Y, Hayashi M, Hisatsune J, Sugai M, Takehiko I, Hayashi T. Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes. DNA Res 2020; 26:391-398. [PMID: 31364694 PMCID: PMC6796507 DOI: 10.1093/dnares/dsz017] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 07/17/2019] [Indexed: 01/23/2023] Open
Abstract
In bacterial genome and metagenome sequencing, Illumina sequencers are most frequently used due to their high throughput capacity, and multiple library preparation kits have been developed for Illumina platforms. Here, we systematically analysed and compared the sequencing bias generated by currently available library preparation kits for Illumina sequencing. Our analyses revealed that a strong sequencing bias is introduced in low-GC regions by the Nextera XT kit. The level of bias introduced is dependent on the level of GC content; stronger bias is generated as the GC content decreases. Other analysed kits did not introduce this strong sequencing bias. The GC content-associated sequencing bias introduced by Nextera XT was more remarkable in metagenome sequencing of a mock bacterial community and seriously affected estimation of the relative abundance of low-GC species. The results of our analyses highlight the importance of selecting proper library preparation kits according to the purposes and targets of sequencing, particularly in metagenome sequencing, where a wide range of microbial species with various degrees of GC content is present. Our data also indicate that special attention should be paid to which library preparation kit was used when analysing and interpreting publicly available metagenomic data.
Collapse
Affiliation(s)
- Mitsuhiko P Sato
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Yoshitoshi Ogura
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Keiji Nakamura
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Ruriko Nishida
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.,Department of Medicine and Biosystemic Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Yasuhiro Gotoh
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| | - Masahiro Hayashi
- Division of Anaerobe Research, Life Science Research Center, Gifu University, Gifu, Gifu, Japan.,Center for Conservation of Microbial Genetic Resource, Gifu University, Gifu, Gifu, Japan
| | - Junzo Hisatsune
- Project Research Center for Nosocomial Infectious Diseases, Hiroshima University, Hiroshima, Hiroshima, Japan.,Department of Bacteriology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Hiroshima, Japan.,Antimicrobial Resistance Research Center, National Institute of Infectious Diseases, Tokyo, Japan
| | - Motoyuki Sugai
- Project Research Center for Nosocomial Infectious Diseases, Hiroshima University, Hiroshima, Hiroshima, Japan.,Department of Bacteriology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Hiroshima, Japan.,Antimicrobial Resistance Research Center, National Institute of Infectious Diseases, Tokyo, Japan
| | - Itoh Takehiko
- Department of Biological Information, Tokyo Institute of Technology, Tokyo, Japan
| | - Tetsuya Hayashi
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan
| |
Collapse
|
14
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
15
|
Eisfeldt J, Mårtensson G, Ameur A, Nilsson D, Lindstrand A. Discovery of Novel Sequences in 1,000 Swedish Genomes. Mol Biol Evol 2020; 37:18-30. [PMID: 31560401 PMCID: PMC6984370 DOI: 10.1093/molbev/msz176] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Gustaf Mårtensson
- Division of Nanobiotechnology, Department of Protein Science, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
16
|
Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0181-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
17
|
Implications of Mobile Genetic Elements for Salmonella enterica Single-Nucleotide Polymorphism Subtyping and Source Tracking Investigations. Appl Environ Microbiol 2019; 85:AEM.01985-19. [PMID: 31585993 DOI: 10.1128/aem.01985-19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 09/30/2019] [Indexed: 12/20/2022] Open
Abstract
Single-nucleotide polymorphisms (SNPs) are widely used for whole-genome sequencing (WGS)-based subtyping of foodborne pathogens in outbreak and source tracking investigations. Mobile genetic elements (MGEs) are commonly present in bacterial genomes and may affect SNP subtyping results if their evolutionary history and dynamics differ from that of the bacterial chromosomes. Using Salmonella enterica as a model organism, we surveyed major categories of MGEs, including plasmids, phages, insertion sequences, integrons, and integrative and conjugative elements (ICEs), in 990 genomes representing 21 major serotypes of S. enterica We evaluated whether plasmids and chromosomal MGEs affect SNP subtyping with 9 outbreak clusters of different serotypes found in the United States in 2018. The median total length of chromosomal MGEs accounted for 2.5% of a typical S. enterica chromosome. Of the 990 analyzed S. enterica isolates, 68.9% contained at least one assembled plasmid sequence. The median total length of assembled plasmids in these isolates was 93,671 bp. Plasmids that carry high densities of SNPs were found to substantially affect both SNP phylogenies and SNP distances among closely related isolates if they were present in the reference genome for SNP subtyping. In comparison, chromosomal MGEs were found to have limited impact on SNP subtyping. We recommend the identification of plasmid sequences in the reference genome and the exclusion of plasmid-borne SNPs from SNP subtyping analysis.IMPORTANCE Despite increasingly routine use of WGS and SNP subtyping in outbreak and source tracking investigations, whether and how MGEs affect SNP subtyping has not been thoroughly investigated. Besides chromosomal MGEs, plasmids are frequently entangled in draft genome assemblies and yet to be assessed for their impact on SNP subtyping. This study provides evidence-based guidance on the treatment of MGEs in SNP analysis for Salmonella to infer phylogenetic relationship and SNP distance between isolates.
Collapse
|
18
|
Eren K, Murrell B. RIFRAF: a frame-resolving consensus algorithm. Bioinformatics 2019; 34:3817-3824. [PMID: 29850783 DOI: 10.1093/bioinformatics/bty426] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 05/22/2018] [Indexed: 01/08/2023] Open
Abstract
Motivation Protein coding genes can be studied using long-read next generation sequencing. However, high rates of indel sequencing errors are problematic, corrupting the reading frame. Even the consensus of multiple independent sequence reads retains indel errors. To solve this problem, we introduce Reference-Informed Frame-Resolving multiple-Alignment Free template inference algorithm (RIFRAF), a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF uses a novel structure, analogous to a two-layer hidden Markov model: the consensus is optimized to maximize alignment scores with both the set of noisy reads and with a reference. The template-to-reads component of the model encodes the preponderance of indels, and is sensitive to the per-base quality scores, giving greater weight to more accurate bases. The reference-to-template component of the model penalizes frame-destroying indels. A local search algorithm proceeds in stages to find the best consensus sequence for both objectives. Results Using Pacific Biosciences SMRT sequences from an HIV-1 env clone, NL4-3, we compare our approach to other consensus and frame correction methods. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. It was able to perfectly reconstruct over 80% of consensus sequences from as few as three reads, whereas the best alternative required twice as many. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones. Availability and implementation RIFRAF is implemented in Julia, and source code is publicly available at https://github.com/MurrellGroup/Rifraf.jl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kemal Eren
- Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Ben Murrell
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
19
|
Tarrant AM, Nilsson B, Hansen BW. Molecular physiology of copepods - from biomarkers to transcriptomes and back again. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2019; 30:230-247. [DOI: 10.1016/j.cbd.2019.03.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 03/14/2019] [Accepted: 03/16/2019] [Indexed: 12/31/2022]
|
20
|
Tian S, Yan H, Klee EW, Kalmbach M, Slager SL. Comparative analysis of de novo assemblers for variation discovery in personal genomes. Brief Bioinform 2019; 19:893-904. [PMID: 28407084 PMCID: PMC6169673 DOI: 10.1093/bib/bbx037] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/08/2017] [Indexed: 12/30/2022] Open
Abstract
Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome.
Collapse
Affiliation(s)
- Shulan Tian
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huihuang Yan
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Eric W Klee
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.,Center for Individualized Medicine Bioinformatics Program, Mayo Clinic, USA
| | - Michael Kalmbach
- Division of Information Management and Analytics, Department of Information Technology, Mayo Clinic, USA
| | - Susan L Slager
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
21
|
He D. The mitochondrial genome of the bamboo false cobra ( Pseudoxenodon bambusicola). Mitochondrial DNA B Resour 2019. [DOI: 10.1080/23802359.2019.1574630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open
Affiliation(s)
- Dan He
- Faculty of Basic Medicine Sichuan College of Traditional Chinese Medicine, Mianyang, Sichuan, PR China
| |
Collapse
|
22
|
Liu Y, Jia Y, Liu C, Ding L, Xia Z. RNA-Seq transcriptome analysis of breast muscle in Pekin ducks supplemented with the dietary probiotic Clostridium butyricum. BMC Genomics 2018; 19:844. [PMID: 30486769 PMCID: PMC6264624 DOI: 10.1186/s12864-018-5261-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/16/2018] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Increased attention is being paid to breast muscle yield and meat quality in the duck breeding industry. Our previous report has demonstrated that dietary Clostridium butyricum (C. butyricum) can improve meat quality of Pekin ducks. However, the potential biological processes and molecular mechanisms that are modulated by dietary C. butyricum in the breast muscle of Pekin ducks remain unknown. RESULTS Supplementation with C. butyricum increased growth performance and meat yield. Therefore, we utilized de novo assembly methods to analyze the RNA-Seq transcriptome profiles in breast muscle to explore the differentially expressed genes between C. butyricum-treated and control Pekin ducks. A total of 1119 differentially expressed candidate genes were found of which 403 genes were significantly up-regulated and 716 genes were significantly down-regulated significantly. qRT-PCR analysis was used to confirm the accuracy of the of RNA-Seq results. GO annotations revealed potential genes, processes and pathways that may participate in meat quality and muscle development. KEGG pathway analysis showed that the differentially expressed genes participated in numerous pathways related to muscle development, including ECM-receptor interaction, the MAPK signaling pathway and the TNF signaling pathway. CONCLUSIONS This study suggests that long-time dietary supplementation with C. butyricum can modulate muscle development and meat quality via altering the expression patterns of genes involved in crucial metabolic pathways. The findings presented here provide unique insights into the molecular mechanisms of muscle development in Pekin ducks in response to dietary C. butyricum.
Collapse
Affiliation(s)
- Yanhan Liu
- College of Veterinary Medicine, China Agricultural University, Beijing, 100193 China
| | - Yaxiong Jia
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Cun Liu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Limin Ding
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Zhaofei Xia
- College of Veterinary Medicine, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
23
|
Sohn JI, Nam JW. The present and future of de novo whole-genome assembly. Brief Bioinform 2018; 19:23-40. [PMID: 27742661 DOI: 10.1093/bib/bbw096] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Indexed: 12/15/2022] Open
Abstract
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome.
Collapse
|
24
|
Jung J, Yi G. A performance analysis of genome search by matching whole targeted reads on different environments. Soft comput 2018. [DOI: 10.1007/s00500-018-3573-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
25
|
Solares EA, Chakraborty M, Miller DE, Kalsow S, Hall K, Perera AG, Emerson JJ, Hawley RS. Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing. G3 (BETHESDA, MD.) 2018; 8:3143-3154. [PMID: 30018084 PMCID: PMC6169397 DOI: 10.1534/g3.118.200162] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 07/03/2018] [Indexed: 12/27/2022]
Abstract
Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila melanogaster reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of D. melanogaster (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hr. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous de novo genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the D. melanogaster reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the D. melanogaster reference genome demonstrates that high-quality de novo assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD).
Collapse
Affiliation(s)
- Edwin A Solares
- Department of Ecology and Evolutionary Biology, University of California Irvine, CA
| | - Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, University of California Irvine, CA
| | - Danny E Miller
- Stowers Institute for Medical Research, Kansas City, MO
- MD-PhD Physician Scientist Training Program, University of Kansas Medical Center, Kansas City, KS
| | - Shannon Kalsow
- Department of Ecology and Evolutionary Biology, University of California Irvine, CA
| | - Kate Hall
- Stowers Institute for Medical Research, Kansas City, MO
| | | | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California Irvine, CA
| | - R Scott Hawley
- Stowers Institute for Medical Research, Kansas City, MO
- Department of Molecular and Integrative Physiology, University of Kansas Medical Center, Kansas City, KS
| |
Collapse
|
26
|
SCOP: a novel scaffolding algorithm based on contig classification and optimization. Bioinformatics 2018; 35:1142-1150. [DOI: 10.1093/bioinformatics/bty773] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 08/10/2018] [Accepted: 09/01/2018] [Indexed: 12/20/2022] Open
|
27
|
Li M, Tang L, Liao Z, Luo J, Wu F, Pan Y, Wang J. A novel scaffolding algorithm based on contig error correction and path extension. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:764-773. [PMID: 30040649 DOI: 10.1109/tcbb.2018.2858267] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The sequence assembly process can be divided into three stages: contigs extension, scaffolding, and gap filling. The scaffolding method is an essential step during the process to infer the direction and sequence relationships between the contigs. However, scaffolding still faces the challenges of uneven sequencing depth, genome repetitive regions, and sequencing errors, which often leads to many false relationships between contigs. The performance of scaffolding can be improved by removing potential false conjunctions between contigs. In this study, a novel scaffolding algorithm which is on the basis of path extension Loose-Strict-Loose strategy and contig error correction, called iLSLS. iLSLS helps reduce the false relationships between contigs, and improve the accuracy of subsequent steps. iLSLS utilizes a scoring function, which estimates the correctness of candidate paths by the distribution of paired reads, and try to conduction the extension with the path which is scored the highest. What's more, iLSLS can precisely estimate the gap size. We conduct experiments on two real datasets, and the results show that LSLS strategy is efficient to increase the correctness of scaffolds, and iLSLS performs better than other scaffolding methods.
Collapse
|
28
|
Forouzan E, Shariati P, Mousavi Maleki MS, Karkhane AA, Yakhchali B. Practical evaluation of 11 de novo assemblers in metagenome assembly. J Microbiol Methods 2018; 151:99-105. [PMID: 29953874 DOI: 10.1016/j.mimet.2018.06.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Revised: 06/16/2018] [Accepted: 06/23/2018] [Indexed: 11/18/2022]
Abstract
Next Generation Sequencing (NGS) technologies are revolutionizing the field of biology and metagenomic-based research. Since the volume of metagenomic data is typically very large, De novo metagenomic assembly can be effectively used to reduce the total amount of data and enhance quality of downstream analysis, such as annotation and binning. Although, there are many freely available assemblers, but selecting one suitable for a specific goal can be highly challenging. In this study, the performance of 11 well-known assemblers was evaluated in the assembly of three different metagenomes. The results obtained show that metaSPAdes is the best assembler and Megahit is a good choice for conservative assembly strategy. In addition, this research provides useful information regarding the pros and cons of each assembler and the effect of read length on assembly, thereby helping scholars to select the optimal assembler based on their objectives.
Collapse
Affiliation(s)
- Esmaeil Forouzan
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Parvin Shariati
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Masoumeh Sadat Mousavi Maleki
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Ali Asghar Karkhane
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| | - Bagher Yakhchali
- Institute of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran.
| |
Collapse
|
29
|
Monat C, Pera B, Ndjiondjop MN, Sow M, Tranchant-Dubreuil C, Bastianelli L, Ghesquière A, Sabot F. De Novo Assemblies of Three Oryza glaberrima Accessions Provide First Insights about Pan-Genome of African Rices. Genome Biol Evol 2018; 9:1-6. [PMID: 28173009 PMCID: PMC5381527 DOI: 10.1093/gbe/evw253] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2016] [Indexed: 11/12/2022] Open
Abstract
Oryza glaberrima is one of the two cultivated species of rice, and harbors various interesting agronomic traits, especially in biotic and abiotic resistance, compared with its Asian cousin O. sativa. A previous reference genome was published but newer studies highlighted some missing parts. Moreover, global species diversity is known nowadays to be represented by more than one single individual. For that purpose, we sequenced, assembled and annotated de novo three different cultivars from O. glaberrima. After validating our assemblies, we were able to better solve complex regions than the previous assembly and to provide a first insight in pan-genomic divergence between individuals. The three assemblies shown large common regions, but almost 25% of the genome present collinearity breakpoints or are even individual specific.
Collapse
Affiliation(s)
- Cécile Monat
- RICE Team, DIADE UMR 232 IRD/UM, IRD France Sud, Montpellier, France
| | - Bérengère Pera
- RICE Team, DIADE UMR 232 IRD/UM, IRD France Sud, Montpellier, France.,CEA//Genoscope, Evry, France
| | | | | | | | - Leila Bastianelli
- Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier, France
| | - Alain Ghesquière
- RICE Team, DIADE UMR 232 IRD/UM, IRD France Sud, Montpellier, France
| | - Francois Sabot
- RICE Team, DIADE UMR 232 IRD/UM, IRD France Sud, Montpellier, France
| |
Collapse
|
30
|
Worthey EA. Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis. ACTA ACUST UNITED AC 2017; 95:9.24.1-9.24.28. [PMID: 29044471 DOI: 10.1002/cphg.49] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 10 years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing or analysis (given access to appropriate tools), but rather clinical interpretation. Interpretation of genetic findings in a complex and ever changing clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires application of appropriate interpretation tools, as well as development and application of appropriate methodologies and standard procedures. This unit provides an overview of these items. Specific challenges related to implementation of genome-wide sequencing in a clinical setting are discussed. © 2017 by John Wiley & Sons, Inc.
Collapse
|
31
|
Wu W, Jiang DC, Sun FH. Next-generation sequencing yields the complete mitochondrial genome of the Shangrila hot-spring snakes ( Thermophis shangrila; Reptilia: Colubridae). Mitochondrial DNA B Resour 2017; 2:327-328. [PMID: 33473816 PMCID: PMC7799666 DOI: 10.1080/23802359.2017.1331330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
In this study, we sequenced the complete mitochondrial genome of Thermophis shangrila by using the next-generation sequencing technique. The total length of the mitogenome was 17,407 bp, which was composed of 13 protein coding genes, two rRNA genes (12s and 16s rRNA), 22 tRNA genes, and two control regions (CRI and CRII). The base composition was 32.6% for A, 23.9% for T, 30.0% for C, and 13.5% for G. We added a fragment about 150 bp in length at control region I, which Peng et al. failed to obtain using Sanger dideoxy sequencing.
Collapse
Affiliation(s)
- Wei Wu
- Engineering Laboratory of Prevention and Control of Veterinary Drug Residues in Animal Derived Food, Chengdu Medical College, Chengdu, China
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China
| | - De-Chun Jiang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China
| | - Feng-Hui Sun
- Engineering Laboratory of Prevention and Control of Veterinary Drug Residues in Animal Derived Food, Chengdu Medical College, Chengdu, China
| |
Collapse
|
32
|
Lin J, Kramna L, Autio R, Hyöty H, Nykter M, Cinek O. Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples. BMC Genomics 2017; 18:378. [PMID: 28506246 PMCID: PMC5430618 DOI: 10.1186/s12864-017-3721-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Accepted: 04/25/2017] [Indexed: 02/06/2023] Open
Abstract
Background Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols. Results We have developed a web-based application allowing direct upload of sequences from multiple virome samples using custom parameters. The samples are then processed in parallel using an identical protocol, and can be easily reanalyzed. The pipeline performs de-novo assembly, taxonomic classification of viruses as well as sample analyses based on user-defined grouping categories. Tables of virus abundance are produced from cross-validation by remapping the sequencing reads to a union of all observed reference viruses. In addition, read sets and reports are created after processing unmapped reads against known human and bacterial ribosome references. Secured interactive results are dynamically plotted with population and diversity charts, clustered heatmaps and a sortable and searchable abundance table. Conclusions The Vipie web application is a unique tool for multi-sample metagenomic analysis of viral data, producing searchable hits tables, interactive population maps, alpha diversity measures and clustered heatmaps that are grouped in applicable custom sample categories. Known references such as human genome and bacterial ribosomal genes are optionally removed from unmapped (‘dark matter’) reads. Secured results are accessible and shareable on modern browsers. Vipie is a freely available web-based tool whose code is open source. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3721-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jake Lin
- BioMediTech and Faculty of Medicine and Life Sciences, University of Tampere, PB 100, FI-33014, Tampere, Finland
| | - Lenka Kramna
- Department of Pediatrics, 2nd Faculty of Medicine, Charles University and University Hospital Motol, V Úvalu 84, 150 06, Praha 5, Czech Republic
| | - Reija Autio
- School of Social Sciences, University of Tampere, Kalevantie 4, 33100, Tampere, Finland
| | - Heikki Hyöty
- BioMediTech and Faculty of Medicine and Life Sciences, University of Tampere, PB 100, FI-33014, Tampere, Finland. .,Fimlab Laboratories, Pirkanmaa Hospital District, Tampere, Finland.
| | - Matti Nykter
- BioMediTech and Faculty of Medicine and Life Sciences, University of Tampere, PB 100, FI-33014, Tampere, Finland.
| | - Ondrej Cinek
- Department of Pediatrics, 2nd Faculty of Medicine, Charles University and University Hospital Motol, V Úvalu 84, 150 06, Praha 5, Czech Republic.
| |
Collapse
|
33
|
Baichoo S, Ouzounis CA. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Biosystems 2017; 156-157:72-85. [PMID: 28392341 DOI: 10.1016/j.biosystems.2017.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 12/12/2022]
Abstract
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality.
Collapse
Affiliation(s)
- Shakuntala Baichoo
- Department of Computer Science & Engineering, University of Mauritius, Réduit 80837, Mauritius.
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica 57001, Greece.
| |
Collapse
|
34
|
Jiang Y, Fan W, Xu J. De novo transcriptome analysis and antimicrobial peptides screening in skin of Paa boulengeri. Genes Genomics 2017. [DOI: 10.1007/s13258-017-0532-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
35
|
Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics. Indian J Microbiol 2016; 57:23-38. [PMID: 28148977 DOI: 10.1007/s12088-016-0629-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 10/27/2016] [Indexed: 01/06/2023] Open
Abstract
Advancement in the next generation sequencing technologies has led to evolution of the field of genomics and metagenomics in a slim duration with nominal cost at precipitous higher rate. While metagenomics and genomics can be separately used to reveal the culture-independent and culture-based microbial evolution, respectively, (meta)genomics together can be used to demonstrate results at population level revealing in-depth complex community interactions for specific ecotypes. The field of metagenomics which started with answering "who is out there?" based on 16S rRNA gene has evolved immensely with the precise organismal reconstruction at species/strain level from the deeply covered metagenome data outweighing the need to isolate bacteria of which 99% are de facto non-cultivable. In this review we have underlined the appeal of metagenomic-derived genomes in providing insights into the evolutionary patterns, growth dynamics, genome/gene-specific sweeps, and durability of environmental pressures. We have demonstrated the use of culture-based genomics and environmental shotgun metagenome data together to elucidate environment specific genome modulations via metagenomic recruitments in terms of gene loss/gain, accessory and core-genome extent. We further illustrated the benefit of (meta)genomics in the understanding of infectious diseases by deducing the relationship between human microbiota and clinical microbiology. This review summarizes the technological advances in the (meta)genomic strategies using the genome and metagenome datasets together to increase the resolution of microbial population studies.
Collapse
|
36
|
Hao C, Xia Z, Fan R, Tan L, Hu L, Wu B, Wu H. De novo transcriptome sequencing of black pepper (Piper nigrum L.) and an analysis of genes involved in phenylpropanoid metabolism in response to Phytophthora capsici. BMC Genomics 2016; 17:822. [PMID: 27769171 PMCID: PMC5075214 DOI: 10.1186/s12864-016-3155-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 10/11/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Piper nigrum L., or "black pepper", is an economically important spice crop in tropical regions. Black pepper production is markedly affected by foot rot disease caused by Phytophthora capsici, and genetic improvement of black pepper is essential for combating foot rot diseases. However, little is known about the mechanism of anti- P. capsici in black pepper. The molecular mechanisms underlying foot rot susceptibility were studied by comparing transcriptome analysis between resistant (Piper flaviflorum) and susceptible (Piper nigrum cv. Reyin-1) black pepper species. RESULTS 116,432 unigenes were acquired from six libraries (three replicates of resistant and susceptible black pepper samples), which were integrated by applying BLAST similarity searches and noted by adopting Kyoto Encyclopaedia of Genes and Gene Ontology (GO) genome orthology identifiers. The reference transcriptome was mapped using two sets of digital gene expression data. Using GO enrichment analysis for the differentially expressed genes, the majority of the genes associated with the phenylpropanoid biosynthesis pathway were identified in P. flaviflorum. In addition, the expression of genes revealed that after susceptible and resistant species were inoculated with P. capsici, the majority of genes incorporated in the phenylpropanoid metabolism pathway were up-regulated in both species. Among various treatments and organs, all the genes were up-regulated to a relatively high degree in resistant species. Phenylalanine ammonia lyase and peroxidase enzyme activity increased in susceptible and resistant species after inoculation with P. capsici, and the resistant species increased faster. The resistant plants retain their vascular structure in lignin revealed by histochemical analysis. CONCLUSIONS Our data provide critical information regarding target genes and a technological basis for future studies of black pepper genetic improvements, including transgenic breeding.
Collapse
Affiliation(s)
- Chaoyun Hao
- Spice and Beverage Research Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Wanning, Hainan 571533 China
- Key Laboratory of Genetic Resources Utilization of Spice and Beverage Crops, Ministry of Agriculture, Wanning, Hainan 571533 China
| | - Zhiqiang Xia
- Institute of Tropical Biosciences and Biotechnology, Chinese Academy of Tropical Agricultural Sciences (CATAS), Haikou, 571101 China
| | - Rui Fan
- Spice and Beverage Research Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Wanning, Hainan 571533 China
- Hainan Provincial Key Laboratory of Genetic Improvement and Quality Regulation for Tropical Spice and Beverage Crops, Wanning, Hainan 571533 China
| | - Lehe Tan
- Spice and Beverage Research Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Wanning, Hainan 571533 China
- Key Laboratory of Genetic Resources Utilization of Spice and Beverage Crops, Ministry of Agriculture, Wanning, Hainan 571533 China
- Hainan Provincial Key Laboratory of Genetic Improvement and Quality Regulation for Tropical Spice and Beverage Crops, Wanning, Hainan 571533 China
| | - Lisong Hu
- Spice and Beverage Research Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Wanning, Hainan 571533 China
- Key Laboratory of Genetic Resources Utilization of Spice and Beverage Crops, Ministry of Agriculture, Wanning, Hainan 571533 China
| | - Baoduo Wu
- Spice and Beverage Research Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Wanning, Hainan 571533 China
- Hainan Provincial Key Laboratory of Genetic Improvement and Quality Regulation for Tropical Spice and Beverage Crops, Wanning, Hainan 571533 China
| | - Huasong Wu
- Spice and Beverage Research Institute, Chinese Academy of Tropical Agricultural Sciences (CATAS), Wanning, Hainan 571533 China
- Key Laboratory of Genetic Resources Utilization of Spice and Beverage Crops, Ministry of Agriculture, Wanning, Hainan 571533 China
- Hainan Provincial Key Laboratory of Genetic Improvement and Quality Regulation for Tropical Spice and Beverage Crops, Wanning, Hainan 571533 China
| |
Collapse
|
37
|
Pai TW, Chen CM. SSRs as genetic markers in the human genome and their observable relationship to hereditary diseases. Biomark Med 2016; 10:563-6. [PMID: 27232109 DOI: 10.2217/bmm-2016-0094] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Tun-Wen Pai
- Department of Computer Science & Engineering, National Taiwan Ocean University, No. 2, Pei-Ning Road, Keelung 20224, Taiwan, R.O.C
| | - Chien-Ming Chen
- Department of Computer Science & Engineering, National Taiwan Ocean University, No. 2, Pei-Ning Road, Keelung 20224, Taiwan, R.O.C
| |
Collapse
|
38
|
Huptas C, Scherer S, Wenning M. Optimized Illumina PCR-free library preparation for bacterial whole genome sequencing and analysis of factors influencing de novo assembly. BMC Res Notes 2016; 9:269. [PMID: 27176120 PMCID: PMC4864918 DOI: 10.1186/s13104-016-2072-9] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 05/02/2016] [Indexed: 01/09/2023] Open
Abstract
Background Next-generation sequencing (NGS) technology has paved the way for rapid and cost-efficient de novo sequencing of bacterial genomes. In particular, the introduction of PCR-free library preparation procedures (LPPs) lead to major improvements as PCR bias is largely reduced. However, in order to facilitate the assembly of Illumina paired-end sequence data and to enhance assembly performance, an increase of insert sizes to facilitate the repeat bridging and resolution capabilities of current state of the art assembly tools is needed. In addition, information concerning the relationships between genomic GC content, library insert size and sequencing quality as well as the influence of library insert size, read length and sequencing depth on assembly performance would be helpful to specifically target sequencing projects. Results Optimized DNA fragmentation settings and fine-tuned resuspension buffer to bead buffer ratios during fragment size selection were integrated in the Illumina TruSeq® DNA PCR-free LPP in order to produce sequencing libraries varying in average insert size for bacterial genomes within a range of 35.4–73.0 % GC content. The modified protocol consumes only half of the reagents per sample, thus doubling the number of preparations possible with a kit. Examination of different libraries revealed that sequencing quality decreases with increased genomic GC content and with larger insert sizes. The estimation of assembly performance using assembly metrics like corrected NG50 and NGA50 showed that libraries with larger insert sizes can result in substantial assembly improvements as long as appropriate assembly tools are chosen. However, such improvements seem to be limited to genomes with a low to medium GC content. A positive trend between read length and assembly performance was observed while sequencing depth is less important, provided a minimum coverage is reached. Conclusions Based on the optimized protocol developed, sequencing libraries with flexible insert sizes and lower reagent costs can be generated. Furthermore, increased knowledge about the interplay of sequencing quality, insert size, genomic GC content, read length, sequencing depth and the assembler used will help molecular biologists to set up an optimal experimental and analytical framework with respect to Illumina next-generation sequencing of bacterial genomes. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-2072-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Christopher Huptas
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs-und Lebensmittelforschung (ZIEL), Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs-und Lebensmittelforschung (ZIEL), Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Mareike Wenning
- Lehrstuhl für Mikrobielle Ökologie, Zentralinstitut für Ernährungs-und Lebensmittelforschung (ZIEL), Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
39
|
|
40
|
Warnke-Sommer J, Ali H. Graph mining for next generation sequencing: leveraging the assembly graph for biological insights. BMC Genomics 2016; 17:340. [PMID: 27154001 PMCID: PMC4859950 DOI: 10.1186/s12864-016-2678-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 04/22/2016] [Indexed: 01/02/2023] Open
Abstract
Background The assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. Results Focus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn’s disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn’s disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers. Conclusions Mining the hybrid graph can reveal biological phenomena captured by its structure. We demonstrate the advantages of considering assembly graphs as data-mining support in addition to their role as frameworks for assembly. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2678-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julia Warnke-Sommer
- College of Information Science and Technology, University of Nebraska Omaha, Omaha, NE, 68182, USA.,Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - Hesham Ali
- College of Information Science and Technology, University of Nebraska Omaha, Omaha, NE, 68182, USA. .,Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
| |
Collapse
|
41
|
Bivens NJ, Zhou M. RNA-Seq Library Construction Methods for Transcriptome Analysis. ACTA ACUST UNITED AC 2016; 1:197-215. [PMID: 31725988 DOI: 10.1002/cppb.20019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Next-generation sequencing (NGS) technologies have revolutionized the study of genomics with an ever-expanding list of applications. RNA-Seq has emerged as a powerful method, applying transcriptome analysis to a wider range of organisms-most significantly, non-model organisms lacking prior genomic sequencing. Whereas an initial concern of NGS datasets was the potential limitation of short read lengths, short read sequences have been successfully employed in creation of de novo transcriptome assemblies that allow for subsequent mapping of reads for expression analysis. Prior genomic sequence knowledge is no longer a requirement for identification of functional transcriptional elements and for global gene expression characterization. Significant cost reductions in generating RNA-Seq data, and improvements in de novo assemblers, has allowed the analysis of transcriptomes in heretofore unsequenced plant species. These protocols describe standard methods for constructing RNA-Seq libraries to be sequenced on Illumina sequencing platforms for comprehensive transcriptome analysis. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Nathan J Bivens
- DNA Core Facility, University of Missouri, Columbia, Missouri
| | - Mingyi Zhou
- DNA Core Facility, University of Missouri, Columbia, Missouri
| |
Collapse
|
42
|
Qiu D, Xu L, Vandemark G, Chen W. Comparative Transcriptome Analysis between the Fungal Plant Pathogens Sclerotinia sclerotiorum and S. trifoliorum Using RNA Sequencing. J Hered 2015; 107:163-72. [PMID: 26615185 DOI: 10.1093/jhered/esv092] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Accepted: 11/06/2015] [Indexed: 12/12/2022] Open
Abstract
The fungal plant pathogens Sclerotinia sclerotiorum and S. trifoliorum are morphologically similar, but differ considerably in host range. In an effort to elucidate mechanisms of the host range difference, transcriptomes of the 2 species at vegetative growth stage were compared to gain further insight into commonality and uniqueness in gene expression and pathogenic mechanisms of the 2 closely related pathogens. A total of 23133 and 21043 unique transcripts were obtained from S. sclerotiorum and S. trifoliorum, respectively. Approximately 43% of the transcripts were genes with known functions for both species. Among 1411 orthologous contigs, about 10% (147) were more highly (>3-fold) expressed in S. trifoliorum than in S. sclerotiorum, and about 12% (173) of the orthologs were more highly (>3-fold) expressed in S. sclerotiorum than in S. trifoliorum. The expression levels of genes on the supercontig 30 have the highest correlation coefficient value between the 2 species. Twenty-seven contigs were found to be new and unique for S. trifoliorum. Additionally, differences in expressed genes involved in pathogenesis like oxalate biosynthesis and endopolygalacturonases were detected between the 2 species. The analyses of the transcriptomes not only discovered similarities and uniqueness in gene expression between the 2 closely related species, providing additional information for annotation the S. sclerotiorum genome, but also provided foundation for comparing the transcriptomes with host-infecting transcriptomes.
Collapse
Affiliation(s)
- Dan Qiu
- From the Department of Plant Pathology, Washington State University, Pullman, WA 99164 (Qiu and Xu); and Grain Legume Genetics and Physiology Research, USDA-ARS, Washington State University, Pullman, WA 99164 (Vandemark and Chen). Dan Qiu is now at the Division of Plant Science and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211
| | - Liangsheng Xu
- From the Department of Plant Pathology, Washington State University, Pullman, WA 99164 (Qiu and Xu); and Grain Legume Genetics and Physiology Research, USDA-ARS, Washington State University, Pullman, WA 99164 (Vandemark and Chen). Dan Qiu is now at the Division of Plant Science and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211
| | - George Vandemark
- From the Department of Plant Pathology, Washington State University, Pullman, WA 99164 (Qiu and Xu); and Grain Legume Genetics and Physiology Research, USDA-ARS, Washington State University, Pullman, WA 99164 (Vandemark and Chen). Dan Qiu is now at the Division of Plant Science and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211
| | - Weidong Chen
- From the Department of Plant Pathology, Washington State University, Pullman, WA 99164 (Qiu and Xu); and Grain Legume Genetics and Physiology Research, USDA-ARS, Washington State University, Pullman, WA 99164 (Vandemark and Chen). Dan Qiu is now at the Division of Plant Science and National Center for Soybean Biotechnology, University of Missouri, Columbia, MO 65211.
| |
Collapse
|
43
|
Abstract
The Ascomycete Onygenales order embraces a diverse group of mammalian pathogens, including the yeast-forming dimorphic fungal pathogens Histoplasma capsulatum, Paracoccidioides spp. and Blastomyces dermatitidis, the dermatophytes Microsporum spp. and Trichopyton spp., the spherule-forming dimorphic fungal pathogens in the genus Coccidioides, and many nonpathogens. Although genomes for all of the aforementioned pathogenic species are available, only one nonpathogen had been sequenced. Here, we enhance comparative phylogenomics in Onygenales by adding genomes for Amauroascus mutatus, Amauroascus niger, Byssoonygena ceratinophila, and Chrysosporium queenslandicum—four nonpathogenic Onygenales species, all of which are more closely related to Coccidioides spp. than any other known Onygenales species. Phylogenomic detection of gene family expansion and contraction can provide clues to fungal function but is sensitive to taxon sampling. By adding additional nonpathogens, we show that LysM domain-containing proteins, previously thought to be expanding in some Onygenales, are contracting in the Coccidioides-Uncinocarpus clade, as are the self-nonself recognition Het loci. The denser genome sampling presented here highlights nearly 800 genes unique to Coccidiodes, which have significantly fewer known protein domains and show increased expression in the endosporulating spherule, the parasitic phase unique to Coccidioides spp. These genomes provide insight to gene family expansion/contraction and patterns of individual gene gain/loss in this diverse order—both major drivers of evolutionary change. Our results suggest that gene family expansion/contraction can lead to adaptive radiations that create taxonomic orders, while individual gene gain/loss likely plays a more significant role in branch-specific phenotypic changes that lead to adaptation for species or genera.
Collapse
|
44
|
Zhu F, Yuan JM, Zhang ZH, Hao JP, Yang YZ, Hu SQ, Yang FX, Qu LJ, Hou ZC. De novotranscriptome assembly and identification of genes associated with feed conversion ratio and breast muscle yield in domestic ducks. Anim Genet 2015; 46:636-45. [DOI: 10.1111/age.12361] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/07/2015] [Indexed: 12/30/2022]
Affiliation(s)
- Feng Zhu
- Department of Animal Genetics and Breeding; National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding; China Agricultural University; Beijing 100193 China
| | - Jian-Ming Yuan
- Department of Animal Nutrition; China Agricultural University; Beijing 100193 China
| | - Zhen-He Zhang
- Department of Animal Genetics and Breeding; National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding; China Agricultural University; Beijing 100193 China
| | - Jin-Ping Hao
- Beijing Jinxing Golden Star Duck Center; Beijing 100076 China
| | - Yu-ze Yang
- Beijing General Station of Animal Husbandry; Beijing 100107 China
| | - Shen-Qiang Hu
- Beijing Jinxing Golden Star Duck Center; Beijing 100076 China
| | - Fang-Xi Yang
- Beijing Jinxing Golden Star Duck Center; Beijing 100076 China
| | - Lu-Jiang Qu
- Department of Animal Genetics and Breeding; National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding; China Agricultural University; Beijing 100193 China
| | - Zhuo-Cheng Hou
- Department of Animal Genetics and Breeding; National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding; China Agricultural University; Beijing 100193 China
| |
Collapse
|
45
|
Kinjo Y, Saitoh S, Tokuda G. An Efficient Strategy Developed for Next-Generation Sequencing of Endosymbiont Genomes Performed Using Crude DNA Isolated from Host Tissues: A Case Study of Blattabacterium cuenoti Inhabiting the Fat Bodies of Cockroaches. Microbes Environ 2015; 30:208-20. [PMID: 26156552 PMCID: PMC4567559 DOI: 10.1264/jsme2.me14153] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Whole-genome sequencing has emerged as one of the most effective means to elucidate the biological roles and molecular features of obligate intracellular symbionts (endosymbionts). However, the de novo assembly of an endosymbiont genome remains a challenge when host and/or mitochondrial DNA sequences are present in a dataset and hinder the assembly of the genome. By focusing on the traits of genome evolution in endosymbionts, we herein developed and investigated a genome-assembly strategy that consisted of two consecutive procedures: the selection of endosymbiont contigs from an output obtained from a de novo assembly performed using a TBLASTX search against a reference genome, named TBLASTX Contig Selection and Filtering (TCSF), and the iterative reassembling of the genome from reads mapped on the selected contigs, named Iterative Mapping and ReAssembling (IMRA), to merge the contigs. In order to validate this approach, we sequenced two strains of the cockroach endosymbiont Blattabacterium cuenoti and applied this strategy to the datasets. TCSF was determined to be highly accurate and sensitive in contig selection even when the genome of a distantly related free-living bacterium was used as a reference genome. Furthermore, the use of IMRA markedly improved sequence assemblies: the genomic sequence of an endosymbiont was almost completed from a dataset containing only 3% of the sequences of the endosymbiont’s genome. The efficiency of our strategy may facilitate further studies on endosymbionts.
Collapse
Affiliation(s)
- Yukihiro Kinjo
- Tropical Biosphere Research Center, University of the Ryukyus
| | | | | |
Collapse
|
46
|
Pan L, Liu Y, Wei Q, Xiao C, Ji Q, Bao G, Wu X. Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus). PLoS One 2015; 10:e0124583. [PMID: 25955442 PMCID: PMC4425669 DOI: 10.1371/journal.pone.0124583] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2014] [Accepted: 02/19/2015] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination. METHODOLOGY/PRINCIPAL FINDINGS Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt). Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO), Clusters of Orthologous Groups (KOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG), we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit. CONCLUSIONS/SIGNIFICANCE The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development.
Collapse
Affiliation(s)
- Lei Pan
- Animal Husbandry and Veterinary Institute, Zhejiang Academy of Agricultural Sciences, Hangzhou, Zhejiang, China
- Chemistry and Life Science, Zhejiang Normal University, Jinhua, Zhejiang, China
| | - Yan Liu
- Animal Husbandry and Veterinary Institute, Zhejiang Academy of Agricultural Sciences, Hangzhou, Zhejiang, China
| | - Qiang Wei
- Animal Husbandry and Veterinary Institute, Zhejiang Academy of Agricultural Sciences, Hangzhou, Zhejiang, China
| | - Chenwen Xiao
- Animal Husbandry and Veterinary Institute, Zhejiang Academy of Agricultural Sciences, Hangzhou, Zhejiang, China
| | - Quanan Ji
- Animal Husbandry and Veterinary Institute, Zhejiang Academy of Agricultural Sciences, Hangzhou, Zhejiang, China
| | - Guolian Bao
- Animal Husbandry and Veterinary Institute, Zhejiang Academy of Agricultural Sciences, Hangzhou, Zhejiang, China
| | - Xinsheng Wu
- College of Animal Science and Technology, Yangzhou Uuniversity, Yangzhou, Jiangsu, China
| |
Collapse
|
47
|
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015; 9:75-88. [PMID: 25983555 PMCID: PMC4426941 DOI: 10.4137/bbi.s12462] [Citation(s) in RCA: 177] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/09/2015] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of "metagenomics", often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.
Collapse
Affiliation(s)
- Anastasis Oulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
- Department of Biology, University of Ghent, Ghent, Belgium
- Department of Microbial Ecophysiology, University of Bremen, Bremen, Germany
| | - Paraskevi Polymenakou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| |
Collapse
|
48
|
Sim M, Kim J. Metagenome assembly through clustering of next-generation sequencing data using protein sequences. J Microbiol Methods 2015; 109:180-7. [PMID: 25572018 DOI: 10.1016/j.mimet.2015.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Revised: 01/03/2015] [Accepted: 01/03/2015] [Indexed: 11/16/2022]
Abstract
The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Animal Biotechnology, Konkuk University, Seoul 143-701, Republic of Korea
| | - Jaebum Kim
- Department of Animal Biotechnology, Konkuk University, Seoul 143-701, Republic of Korea.
| |
Collapse
|
49
|
Vijayakumar P, Raut AA, Kumar P, Sharma D, Mishra A. De novo assembly and analysis of crow lungs transcriptome. Genome 2015; 57:499-506. [PMID: 25633965 DOI: 10.1139/gen-2014-0122] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The jungle crow (Corvus macrorhynchos) belongs to the order Passeriformes of bird species and is important for avian ecological and evolutionary genetics studies. However, there is limited information on the transcriptome data of this species. In the present study, we report the characterization of the lung transcriptome of the jungle crow using GS FLX Titanium XLR70. Altogether, 1,510,303 high-quality sequence reads with 581,198,230 bases was de novo assembled into 22,169 isotigs (isotig represents an individual transcript) and 784,009 singletons. Using these isotigs and 581,681 length-filtered (greater than 300 bp) singletons, 20,010 unique protein-coding genes were identified by BLASTx comparison against a nonredundant (nr) protein sequence database. Comparative analysis revealed that 46,604 (70.29%) and 51,642 (72.48%) of the assembled transcripts have significant similarity to zebra finch and chicken RefSeq proteins, respectively. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Transcripts putatively involved in the immune response were identified. Furthermore, 20,599 single nucleotide polymorphisms (SNPs) and 7525 simple sequence repeats (SSRs) were retrieved from the assembled transcript database. This resource should lay an important base for future ecological, evolutionary, and conservation genetic studies on this species and in other related species.
Collapse
Affiliation(s)
- Periyasamy Vijayakumar
- a High Security Animal Disease Laboratory, Indian Veterinary Research Institute, Anand Nagar, Bhopal-462021, Madhya Pradesh, India
| | | | | | | | | |
Collapse
|
50
|
Bratcher HB, Corton C, Jolley KA, Parkhill J, Maiden MCJ. A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes. BMC Genomics 2014; 15:1138. [PMID: 25523208 PMCID: PMC4377854 DOI: 10.1186/1471-2164-15-1138] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 12/04/2014] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Highly parallel, 'second generation' sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary. RESULTS The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database. CONCLUSIONS The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages.
Collapse
|