1
|
Samantray D, Tanwar AS, Murali TS, Brand A, Satyamoorthy K, Paul B. A Comprehensive Bioinformatics Resource Guide for Genome-Based Antimicrobial Resistance Studies. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2023; 27:445-460. [PMID: 37861712 DOI: 10.1089/omi.2023.0140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The use of high-throughput sequencing technologies and bioinformatic tools has greatly transformed microbial genome research. With the help of sophisticated computational tools, it has become easier to perform whole genome assembly, identify and compare different species based on their genomes, and predict the presence of genes responsible for proteins, antimicrobial resistance, and toxins. These bioinformatics resources are likely to continuously improve in quality, become more user-friendly to analyze the multiple genomic data, efficient in generating information and translating it into meaningful knowledge, and enhance our understanding of the genetic mechanism of AMR. In this manuscript, we provide an essential guide for selecting the popular resources for microbial research, such as genome assembly and annotation, antibiotic resistance gene profiling, identification of virulence factors, and drug interaction studies. In addition, we discuss the best practices in computer-oriented microbial genome research, emerging trends in microbial genomic data analysis, integration of multi-omics data, the appropriate use of machine-learning algorithms, and open-source bioinformatics resources for genome data analytics.
Collapse
Affiliation(s)
- Debyani Samantray
- Department of Bioinformatics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, India
| | - Ankit Singh Tanwar
- United Nations University-Maastricht Economic and Social Research Institute on Innovation and Technology (UNU-MERIT), Maastricht, The Netherlands
- Faculty of Health, Medicine and Life Sciences (FHML), Maastricht University, Maastricht, The Netherlands
| | - Thokur Sreepathy Murali
- Department of Biotechnology, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, India
| | - Angela Brand
- United Nations University-Maastricht Economic and Social Research Institute on Innovation and Technology (UNU-MERIT), Maastricht, The Netherlands
- Faculty of Health, Medicine and Life Sciences (FHML), Maastricht University, Maastricht, The Netherlands
- Department of Health Information, Prasanna School of Public Health (PSPH), Manipal Academy of Higher Education, Manipal, India
| | - Kapaettu Satyamoorthy
- SDM College of Medical Sciences and Hospital, Shri Dharmasthala Manjunatheshwara (SDM) University, Dharwad, India
| | - Bobby Paul
- Department of Bioinformatics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
2
|
Rather MA, Agarwal D, Bhat TA, Khan IA, Zafar I, Kumar S, Amin A, Sundaray JK, Qadri T. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture. Int J Biol Macromol 2023; 233:123549. [PMID: 36740117 DOI: 10.1016/j.ijbiomac.2023.123549] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023]
Abstract
Aquaculture has witnessed an excellent growth rate during the last two decades and offers huge potential to provide nutritional as well as livelihood security. Genomic research has contributed significantly toward the development of beneficial technologies for aquaculture. The existing high throughput technologies like next-generation technologies generate oceanic data which requires extensive analysis using appropriate tools. Bioinformatics is a rapidly evolving science that involves integrating gene based information and computational technology to produce new knowledge for the benefit of aquaculture. Bioinformatics provides new opportunities as well as challenges for information and data processing in new generation aquaculture. Rapid technical advancements have opened up a world of possibilities for using current genomics to improve aquaculture performance. Understanding the genes that govern economically relevant characteristics, necessitates a significant amount of additional research. The various dimensions of data sources includes next-generation DNA sequencing, protein sequencing, RNA sequencing gene expression profiles, metabolic pathways, molecular markers, and so on. Appropriate bioinformatics tools are developed to mine the biologically relevant and commercially useful results. The purpose of this scoping review is to present various arms of diverse bioinformatics tools with special emphasis on practical translation to the aquaculture industry.
Collapse
Affiliation(s)
- Mohd Ashraf Rather
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India.
| | - Deepak Agarwal
- Institute of Fisheries Post Graduation Studies OMR Campus, Vaniyanchavadi, Chennai, India
| | | | - Irfan Ahamd Khan
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Sujit Kumar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Adnan Amin
- Postgraduate Institute of Fisheries Education and Research Kamdhenu University, Gandhinagar-India University of Kurasthra, India; Department of Aquatic Environmental Management, Faculty of Fisheries Rangil- Ganderbel -SKUAST-K, India
| | - Jitendra Kumar Sundaray
- ICAR-Central Institute of Freshwater Aquaculture, Kausalyaganga, Bhubaneswar, Odisha 751002, India
| | - Tahiya Qadri
- Division of Food Science and Technology, SKUAST-K, Shalimar, India
| |
Collapse
|
3
|
Ryšavý P, Železný F. Reference-free phylogeny from sequencing data. BioData Min 2023; 16:13. [PMID: 36973746 PMCID: PMC10045052 DOI: 10.1186/s13040-023-00329-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Abstract
Motivation
Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly. Therefore, tools that are able to process those data without supervision need to be developed.
Results
In this paper, we present a tool for reference-free phylogeny capable of handling data where no mature-level assembly is available. The tool allows distance calculation for raw reads, contigs, and the combination of the latter. The tool provides an estimation of the Levenshtein distance between the sequences, which in turn estimates the number of mutations between the organisms. Compared to the previous research, the novelty of the method lies in a newly proposed combination of the read and contig measures, a new method for read-contig mapping, and an efficient embedding of contigs.
Collapse
|
4
|
Rakkammal K, Priya A, Pandian S, Maharajan T, Rathinapriya P, Satish L, Ceasar SA, Sohn SI, Ramesh M. Conventional and Omics Approaches for Understanding the Abiotic Stress Response in Cereal Crops-An Updated Overview. PLANTS (BASEL, SWITZERLAND) 2022; 11:plants11212852. [PMID: 36365305 PMCID: PMC9655223 DOI: 10.3390/plants11212852] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/19/2022] [Accepted: 10/22/2022] [Indexed: 05/22/2023]
Abstract
Cereals have evolved various tolerance mechanisms to cope with abiotic stress. Understanding the abiotic stress response mechanism of cereal crops at the molecular level offers a path to high-yielding and stress-tolerant cultivars to sustain food and nutritional security. In this regard, enormous progress has been made in the omics field in the areas of genomics, transcriptomics, and proteomics. Omics approaches generate a massive amount of data, and adequate advancements in computational tools have been achieved for effective analysis. The combination of integrated omics and bioinformatics approaches has been recognized as vital to generating insights into genome-wide stress-regulation mechanisms. In this review, we have described the self-driven drought, heat, and salt stress-responsive mechanisms that are highlighted by the integration of stress-manipulating components, including transcription factors, co-expressed genes, proteins, etc. This review also provides a comprehensive catalog of available online omics resources for cereal crops and their effective utilization. Thus, the details provided in the review will enable us to choose the appropriate tools and techniques to reduce the negative impacts and limit the failures in the intensive crop improvement study.
Collapse
Affiliation(s)
- Kasinathan Rakkammal
- Department of Biotechnology, Science Campus, Alagappa University, Karaikudi 630003, Tamil Nadu, India
| | - Arumugam Priya
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27606, USA
| | - Subramani Pandian
- Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea
| | - Theivanayagam Maharajan
- Department of Biosciences, Rajagiri College of Social Sciences, Cochin 683104, Kerala, India
| | - Periyasamy Rathinapriya
- Department of Biotechnology, Science Campus, Alagappa University, Karaikudi 630003, Tamil Nadu, India
| | - Lakkakula Satish
- Applied Phycology and Biotechnology Division, Marine Algal Research Station, Mandapam Camp, CSIR—Central Salt and Marine Chemicals Research Institute, Bhavnagar 623519, Tamil Nadu, India
| | | | - Soo-In Sohn
- Department of Agricultural Biotechnology, National Institute of Agricultural Sciences, Rural Development Administration, Jeonju 54874, Korea
| | - Manikandan Ramesh
- Department of Biotechnology, Science Campus, Alagappa University, Karaikudi 630003, Tamil Nadu, India
- Correspondence:
| |
Collapse
|
5
|
Reddy B, Mehta S, Prakash G, Sheoran N, Kumar A. Structured Framework and Genome Analysis of Magnaporthe grisea Inciting Pearl Millet Blast Disease Reveals Versatile Metabolic Pathways, Protein Families, and Virulence Factors. J Fungi (Basel) 2022; 8:jof8060614. [PMID: 35736098 PMCID: PMC9225118 DOI: 10.3390/jof8060614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/10/2022] [Accepted: 05/06/2022] [Indexed: 12/19/2022] Open
Abstract
Magnaporthe grisea (T.T. Herbert) M.E. Barr is a major fungal phytopathogen that causes blast disease in cereals, resulting in economic losses worldwide. An in-depth understanding of the basis of virulence and ecological adaptation of M. grisea is vital for devising effective disease management strategies. Here, we aimed to determine the genomic basis of the pathogenicity and underlying biochemical pathways in Magnaporthe using the genome sequence of a pearl millet-infecting M. grisea PMg_Dl generated by dual NGS techniques, Illumina NextSeq 500 and PacBio RS II. The short and long nucleotide reads could be draft assembled in 341 contigs and showed a genome size of 47.89 Mb with the N50 value of 765.4 Kb. Magnaporthe grisea PMg_Dl showed an average nucleotide identity (ANI) of 86% and 98% with M. oryzae and Pyricularia pennisetigena, respectively. The gene-calling method revealed a total of 10,218 genes and 10,184 protein-coding sequences in the genome of PMg_Dl. InterProScan of predicted protein showed a distinct 3637 protein families and 695 superfamilies in the PMg_Dl genome. In silico virulence analysis revealed the presence of 51VFs and 539 CAZymes in the genome. The genomic regions for the biosynthesis of cellulolytic endo-glucanase and beta-glucosidase, as well as pectinolytic endo-polygalacturonase, pectin-esterase, and pectate-lyases (pectinolytic) were detected. Signaling pathways modulated by MAPK, PI3K-Akt, AMPK, and mTOR were also deciphered. Multicopy sequences suggestive of transposable elements such as Type LTR, LTR/Copia, LTR/Gypsy, DNA/TcMar-Fot1, and Type LINE were recorded. The genomic resource presented here will be of use in the development of molecular marker and diagnosis, population genetics, disease management, and molecular taxonomy, and also provide a genomic reference for ascomycetous genome investigations in the future.
Collapse
Affiliation(s)
- Bhaskar Reddy
- Division of Plant Pathology, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi 110012, India; (G.P.); (N.S.)
- Correspondence: (B.R.); (A.K.)
| | - Sahil Mehta
- Crop Improvement Group, International Centre for Genetic Engineering and Biotechnology, New Delhi 110067, India;
| | - Ganesan Prakash
- Division of Plant Pathology, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi 110012, India; (G.P.); (N.S.)
| | - Neelam Sheoran
- Division of Plant Pathology, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi 110012, India; (G.P.); (N.S.)
| | - Aundy Kumar
- Division of Plant Pathology, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi 110012, India; (G.P.); (N.S.)
- Correspondence: (B.R.); (A.K.)
| |
Collapse
|
6
|
Hekkala ER, Colten R, Cunningham SW, Smith O, Ikram S. Using Mitogenomes to Explore the Social and Ecological Contexts of Crocodile Mummification in Ancient Egypt. BULLETIN OF THE PEABODY MUSEUM OF NATURAL HISTORY 2022. [DOI: 10.3374/014.063.0101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Evon R. Hekkala
- Department of Biological Sciences, Fordham University, Bronx, NY 10458 USA —
| | - Roger Colten
- Division of Anthropology, Peabody Museum of Natural History, Yale University, New Haven, CT 06520–8118 USA
| | - Seth W. Cunningham
- Department of Biological Sciences, Fordham University, Bronx, NY 10458 USA
| | - Oliver Smith
- Micropathology, Ltd., University of Warwick Science Park, Coventry, CV4 7EZ, United Kingdom
| | - Salima Ikram
- Department of Sociology, Egyptology, and Anthropology, The American University in Cairo, Cairo, Egypt
| |
Collapse
|
7
|
Dermatitis during Spaceflight Associated with HSV-1 Reactivation. Viruses 2022; 14:v14040789. [PMID: 35458519 PMCID: PMC9028032 DOI: 10.3390/v14040789] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 03/21/2022] [Accepted: 03/25/2022] [Indexed: 02/04/2023] Open
Abstract
Human alpha herpesviruses herpes simplex virus (HSV-1) and varicella zoster virus (VZV) establish latency in various cranial nerve ganglia and often reactivate in response to stress-associated immune system dysregulation. Reactivation of Epstein Barr virus (EBV), VZV, HSV-1, and cytomegalovirus (CMV) is typically asymptomatic during spaceflight, though live/infectious virus has been recovered and the shedding rate increases with mission duration. The risk of clinical disease, therefore, may increase for astronauts assigned to extended missions (>180 days). Here, we report, for the first time, a case of HSV-1 skin rash (dermatitis) occurring during long-duration spaceflight. The astronaut reported persistent dermatitis during flight, which was treated onboard with oral antihistamines and topical/oral steroids. No HSV-1 DNA was detected in 6-month pre-mission saliva samples, but on flight day 82, a saliva and rash swab both yielded 4.8 copies/ng DNA and 5.3 × 104 copies/ng DNA, respectively. Post-mission saliva samples continued to have a high infectious HSV-1 load (1.67 × 107 copies/ng DNA). HSV-1 from both rash and saliva samples had 99.9% genotype homology. Additional physiological monitoring, including stress biomarkers (cortisol, dehydroepiandrosterone (DHEA), and salivary amylase), immune markers (adaptive regulatory and inflammatory plasma cytokines), and biochemical profile markers, including vitamin/mineral status and bone metabolism, are also presented for this case. These data highlight an atypical presentation of HSV-1 during spaceflight and underscore the importance of viral screening during clinical evaluations of in-flight dermatitis to determine viral etiology and guide treatment.
Collapse
|
8
|
Genome characterization of the novel lytic genome sequence of the phage YUEEL01 of the Myoviridae family. Virus Res 2021; 309:198670. [PMID: 34971703 DOI: 10.1016/j.virusres.2021.198670] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 12/19/2021] [Accepted: 12/21/2021] [Indexed: 12/24/2022]
Abstract
Antimicrobial resistance is a global concern because of its rapid emergence in the environment and the associated high risk to human and animal health. Municipal wastewater, including urban, hospital, and pharmaceutical effluent, is the primary source of contamination by antibiotics and antibiotic-resistant bacteria (ARB). Biological processes are commonly used for wastewater treatment. Biologically based strategies are a promising approach to effective integrated ARB control because they focus on antibiotic resistance. An effective bacteriophage against multi-drug resistance (MDR) microbes in municipal wastewater was.
Collapse
|
9
|
Song MH, Yan C, Li JT. MEANGS: an efficient seed-free tool for de novo assembling animal mitochondrial genome using whole genome NGS data. Brief Bioinform 2021; 23:6481918. [PMID: 34941991 DOI: 10.1093/bib/bbab538] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 10/23/2021] [Accepted: 11/22/2021] [Indexed: 11/13/2022] Open
Abstract
Advances in next-generation sequencing (NGS) technologies have led to an exponential increase in the number of whole genome sequences (WGS) in databases. This wealth of WGS data has greatly facilitated the recovery of full mitochondrial genomes (mitogenomes), which are vital for phylogenetic, evolutionary and ecological studies. Unfortunately, most existing software cannot easily assemble mitogenome reference sequences conveniently or efficiently. Therefore, we developed a seed-free de novo assembly tool, MEANGS, which applies the trie-search method to extend contigs from self-discovery seeds and assemble a mitogenome from animal WGS data. We then used data from 16 species with different qualities to compare the performance of MEANGS with three other available programs. MEANGS exhibited the best overall performance since it was the only one that completed all tests, and it assembled full or partial mitogenomes for all of the tested samples while the others failed. Furthermore, MEANGS selects superior assembly sequences and annotates protein-coding genes. Thus, MEANGS can be one of the most efficient software for generating high-quality mitogenomes so far, the further use of it will benefit the study on mitogenome based on whole genome NGS data. MEANGS is available at https://github.com/YanCCscu/meangs.
Collapse
Affiliation(s)
- Meng-Huan Song
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| | - Chaochao Yan
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| | - Jia-Tang Li
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| |
Collapse
|
10
|
Voigt B, Fischer O, Krumnow C, Herta C, Dabrowski PW. NGS read classification using AI. PLoS One 2021; 16:e0261548. [PMID: 34936673 PMCID: PMC8694450 DOI: 10.1371/journal.pone.0261548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 12/03/2021] [Indexed: 11/19/2022] Open
Abstract
Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.
Collapse
Affiliation(s)
- Benjamin Voigt
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Oliver Fischer
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Krumnow
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Herta
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Piotr Wojciech Dabrowski
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| |
Collapse
|
11
|
Ye H, Cheng L, Ju B, Xu G, Liu Y, Zhang S, Wang L, Zhang Z. SCIGA: Software for large-scale, single-cell immunoglobulin repertoire analysis. Gigascience 2021; 10:giab050. [PMID: 34585238 PMCID: PMC8478610 DOI: 10.1093/gigascience/giab050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 03/19/2021] [Accepted: 06/28/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND B-cell immunoglobulin repertoires with paired heavy and light chain can be determined by means of 10X single-cell V(D)J sequencing. Precise and quick analysis of 10X single-cell immunoglobulin repertoires remains a challenge owing to the high diversity of immunoglobulin repertoires and a lack of specialized software that can analyze such diverse data. FINDINGS In this study, specialized software for 10X single-cell immunoglobulin repertoire analysis was developed. SCIGA (Single-Cell Immunoglobulin Repertoire Analysis) is an easy-to-use pipeline that performs read trimming, immunoglobulin sequence assembly and annotation, heavy and light chain pairing, statistical analysis, visualization, and multiple sample integration analysis, which is all achieved by using a 1-line command. Then SCIGA was used to profile the single-cell immunoglobulin repertoires of 9 patients with coronavirus disease 2019 (COVID-19). Four neutralizing antibodies against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were identified from these repertoires. CONCLUSIONS SCIGA provides a complete and quick analysis for 10X single-cell V(D)J sequencing datasets. It can help researchers to interpret B-cell immunoglobulin repertoires with paired heavy and light chain.
Collapse
Affiliation(s)
- Haocheng Ye
- Institute for Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences (CAS), Beijing 100101, China
| | - Lin Cheng
- Institute for Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
| | - Bin Ju
- Institute for Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
| | - Gang Xu
- Institute for Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
| | - Yang Liu
- Institute for Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
| | - Shuye Zhang
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China
| | - Lifei Wang
- Department of Radiology, National Clinical Research Center for Infectious Disease, Shenzhen, Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
| | - Zheng Zhang
- Institute for Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, The Second Affiliated Hospital, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 518112, China
| |
Collapse
|
12
|
Dida F, Yi G. Empirical evaluation of methods for de novo genome assembly. PeerJ Comput Sci 2021; 7:e636. [PMID: 34307867 PMCID: PMC8279138 DOI: 10.7717/peerj-cs.636] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 06/19/2021] [Indexed: 06/12/2023]
Abstract
Technologies for next-generation sequencing (NGS) have stimulated an exponential rise in high-throughput sequencing projects and resulted in the development of new read-assembly algorithms. A drastic reduction in the costs of generating short reads on the genomes of new organisms is attributable to recent advances in NGS technologies such as Ion Torrent, Illumina, and PacBio. Genome research has led to the creation of high-quality reference genomes for several organisms, and de novo assembly is a key initiative that has facilitated gene discovery and other studies. More powerful analytical algorithms are needed to work on the increasing amount of sequence data. We make a thorough comparison of the de novo assembly algorithms to allow new users to clearly understand the assembly algorithms: overlap-layout-consensus and de-Bruijn-graph, string-graph based assembly, and hybrid approach. We also address the computational efficacy of each algorithm's performance, challenges faced by the assem- bly tools used, and the impact of repeats. Our results compare the relative performance of the different assemblers and other related assembly differences with and without the reference genome. We hope that this analysis will contribute to further the application of de novo sequences and help the future growth of assembly algorithms.
Collapse
Affiliation(s)
- Firaol Dida
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| | - Gangman Yi
- Department of Multimedia Engineering, Dongguk University, Seoul, South Korea
| |
Collapse
|
13
|
Clark RD, Aardema ML, Andolfatto P, Barber PH, Hattori A, Hoey JA, Montes HR, Pinsky ML. Genomic signatures of spatially divergent selection at clownfish range margins. Proc Biol Sci 2021; 288:20210407. [PMID: 34102891 PMCID: PMC8187997 DOI: 10.1098/rspb.2021.0407] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 05/11/2021] [Indexed: 01/25/2023] Open
Abstract
Understanding how evolutionary forces interact to drive patterns of selection and distribute genetic variation across a species' range is of great interest in ecology and evolution, especially in an era of global change. While theory predicts how and when populations at range margins are likely to undergo local adaptation, empirical evidence testing these models remains sparse. Here, we address this knowledge gap by investigating the relationship between selection, gene flow and genetic drift in the yellowtail clownfish, Amphiprion clarkii, from the core to the northern periphery of the species range. Analyses reveal low genetic diversity at the range edge, gene flow from the core to the edge and genomic signatures of local adaptation at 56 single nucleotide polymorphisms in 25 candidate genes, most of which are significantly correlated with minimum annual sea surface temperature. Several of these candidate genes play a role in functions that are upregulated during cold stress, including protein turnover, metabolism and translation. Our results illustrate how spatially divergent selection spanning the range core to the periphery can occur despite the potential for strong genetic drift at the range edge and moderate gene flow from the core populations.
Collapse
Affiliation(s)
- René D. Clark
- Department of Ecology, Evolution and Natural Resources, Rutgers University, 14 College Farm Road, New Brunswick, NJ 08901, USA
| | - Matthew L. Aardema
- Department of Biology, Montclair State University, 1 Normal Avenue, Montclair, NJ 07043, USA
- Sackler Institute for Comparative Genomics, American Museum of Natural History, 200 Central Park West, New York, NY 10024-5102, USA
| | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY 10026, USA
| | - Paul H. Barber
- Department of Ecology and Evolutionary Biology, University of California-Los Angeles, Los Angeles, CA 90095, USA
| | - Akihisa Hattori
- Faculty of Liberal Arts and Education, Shiga University, 2-5-1 Hiratsu, Otsu, Shiga 520-0862, Japan
| | - Jennifer A. Hoey
- Department of Ecology, Evolution and Natural Resources, Rutgers University, 14 College Farm Road, New Brunswick, NJ 08901, USA
- Department of Ecology and Evolutionary Biology, University of California-Santa Cruz, 130 McAllister Way, Santa Cruz, CA 95060, USA
| | | | - Malin L. Pinsky
- Department of Ecology, Evolution and Natural Resources, Rutgers University, 14 College Farm Road, New Brunswick, NJ 08901, USA
| |
Collapse
|
14
|
Wani GA, Khan MA, Dar MA, Shah MA, Reshi ZA. Next Generation High Throughput Sequencing to Assess Microbial Communities: An Application Based on Water Quality. BULLETIN OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY 2021; 106:727-733. [PMID: 33774727 DOI: 10.1007/s00128-021-03195-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 03/13/2021] [Indexed: 06/12/2023]
Abstract
Traditional techniques to identify different contaminants (biological or chemical) in the waters are slow, laborious, and can require specialized expertise. Hence, the rapid determination of water quality using more sensitive and reliable metagenomic based approaches attains special importance. Metagenomics deals with the study of genetic material that is recovered from microbial communities present in environmental samples. In traditional techniques cultivation-based methodologies were used to describe the diversity of microorganisms in environmental samples. It has failed to function as a robust marker because of limited taxonomic and phylogenetic implications. In this backdrop, high-throughput DNA sequencing approaches have proven very powerful in microbial source tracking because of investigating the full variety of genome-based analysis such as microbial genetic diversity and population structure played by them. Next generation sequencing technologies can reveal a greater proportion of microbial communities that have not been reported earlier by traditional techniques. The present review highlights the shift from traditional techniques for the basic study of community composition to next-generation sequencing (NGS) platforms and their potential applications to the biomonitoring of water quality in relation to human health.
Collapse
Affiliation(s)
- Gowher A Wani
- Department of Botany, University of Kashmir, Srinagar, Jammu & Kashmir, 190 006, India.
| | - Mohd Asgar Khan
- Department of Botany, University of Kashmir, Srinagar, Jammu & Kashmir, 190 006, India
| | - Mudasir A Dar
- Department of Botany, University of Kashmir, Srinagar, Jammu & Kashmir, 190 006, India
| | - Manzoor A Shah
- Department of Botany, University of Kashmir, Srinagar, Jammu & Kashmir, 190 006, India
| | - Zafar A Reshi
- Department of Botany, University of Kashmir, Srinagar, Jammu & Kashmir, 190 006, India
| |
Collapse
|
15
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
16
|
De Mori G, Zaina G, Franco-Orozco B, Testolin R, De Paoli E, Cipriani G. Targeted Mutagenesis of the Female-Suppressor SyGI Gene in Tetraploid Kiwifruit by CRISPR/CAS9. PLANTS 2020; 10:plants10010062. [PMID: 33396671 PMCID: PMC7823651 DOI: 10.3390/plants10010062] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 12/18/2020] [Accepted: 12/27/2020] [Indexed: 11/16/2022]
Abstract
Kiwifruit belong to the genus Actinidia with 54 species apparently all functionally dioecious. The sex-determinants of the type XX/XY, with male heterogametic, operate independently of the ploidy level. Recently, the SyGI protein has been described as the suppressor of female development. In the present study, we exploited the CRISPR/Cas9 technology by targeting two different sites in the SyGI gene in order to induce a stable gene knock-out in two tetraploid male accessions of Actinidia chinensis var. chinensis. The two genotypes showed a regenerative efficiency of 58% and 73%, respectively. Despite not yet being able to verify the phenotypic effects on the flower structure, due to the long time required by tissue-cultured kiwifruit plants to flower, we obtained two regenerated lines showing near fixation of a unique modification in their genome, resulting in both cases in the onset of a premature stop codon, which induces the putative gene knock-out. Evaluation of gRNA1 locus for both regenerated plantlets resulted in co-amplification of a minor variant differing from the target region for a single nucleotide. A genomic duplication of the region in proximity of the Y genomic region could be postulated.
Collapse
Affiliation(s)
- Gloria De Mori
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy; (G.Z.); (B.F.-O.); (R.T.); (E.D.P.); (G.C.)
- Correspondence:
| | - Giusi Zaina
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy; (G.Z.); (B.F.-O.); (R.T.); (E.D.P.); (G.C.)
| | - Barbara Franco-Orozco
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy; (G.Z.); (B.F.-O.); (R.T.); (E.D.P.); (G.C.)
- Facultad de Ingeniería, Tecnológico de Antioquia–Institución Universitaria TdeA, Calle 78b No. 72A-220, Medellín-Antioquia 050001, Colombia
| | - Raffaele Testolin
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy; (G.Z.); (B.F.-O.); (R.T.); (E.D.P.); (G.C.)
| | - Emanuele De Paoli
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy; (G.Z.); (B.F.-O.); (R.T.); (E.D.P.); (G.C.)
| | - Guido Cipriani
- Department of Agricultural, Food, Environmental and Animal Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy; (G.Z.); (B.F.-O.); (R.T.); (E.D.P.); (G.C.)
| |
Collapse
|
17
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
18
|
Yao W, Li Y, Xie W, Wang L. Features of sRNA biogenesis in rice revealed by genetic dissection of sRNA expression level. Comput Struct Biotechnol J 2020; 18:3207-3216. [PMID: 33209208 PMCID: PMC7649420 DOI: 10.1016/j.csbj.2020.10.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 09/24/2020] [Accepted: 10/11/2020] [Indexed: 01/25/2023] Open
Abstract
We previously conducted a QTL analysis of small RNA (sRNA) abundance in flag leaves of an immortalized rice F2 (IMF2) population by aligning sRNA reads to the reference genome to quantify the expression levels of sRNAs. However, this approach missed about half of the sRNAs as only 50% of all sRNA reads could be uniquely aligned to the reference genome. Here, we quantified the expression levels of sRNAs and sRNA clusters without the use of a reference genome. QTL analysis of the expression levels of sRNAs and sRNA clusters confirmed the feasibility of this approach. sRNAs and sRNA clusters with identified QTLs were then aligned to the high-quality parental genomes of the IMF2 population to resolve the identified QTLs into local vs. distant regulation mode. We were able to detect new QTL hotspots by considering sRNAs aligned to multiple positions of the parental genomes and sRNAs unaligned to the parental genomes. We found that several local-QTL hotspots were caused by sequence variations in long inverted repeats, which probably function as precursors of sRNAs, between the two parental genomes. The expression levels of these sRNAs were significantly associated with the presence/absence of the long inverted repeats in the IMF2 population. Moreover, we found that the variations in whole-genome sRNA species composition among different IMF2s were attributed to sRNA biogenesis genes including OsDCL2b and OsRDR2. Our results highlight that genetic dissection of sRNA expression is a promising approach to disclose new components functioning in sRNA biogenesis and new mechanisms of sRNA biogenesis.
Collapse
Affiliation(s)
- Wen Yao
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China.,National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan 430070, China
| | - Yang Li
- National Key Laboratory of Wheat and Maize Crop Science, College of Life Sciences, Henan Agricultural University, Zhengzhou 450002, China
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan 430070, China
| | - Lei Wang
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
19
|
Eliseev A, Gibson KM, Avdeyev P, Novik D, Bendall ML, Pérez-Losada M, Alexeev N, Crandall KA. Evaluation of haplotype callers for next-generation sequencing of viruses. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 82:104277. [PMID: 32151775 PMCID: PMC7293574 DOI: 10.1016/j.meegid.2020.104277] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 01/30/2023]
Abstract
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Collapse
Affiliation(s)
- Anton Eliseev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
| | - Pavel Avdeyev
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA
| | - Dmitry Novik
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
20
|
Iaconelli M, Bonanno Ferraro G, Mancini P, Suffredini E, Veneri C, Ciccaglione AR, Bruni R, Della Libera S, Bignami F, Brambilla M, De Medici D, Brandtner D, Schembri P, D’Amato S, La Rosa G. Nine-Year Nationwide Environmental Surveillance of Hepatitis E Virus in Urban Wastewaters in Italy (2011-2019). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E2059. [PMID: 32244915 PMCID: PMC7143501 DOI: 10.3390/ijerph17062059] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 03/18/2020] [Accepted: 03/18/2020] [Indexed: 12/19/2022]
Abstract
Hepatitis E virus (HEV) is an emerging causative agent of acute hepatitis worldwide. To provide insights into the epidemiology of HEV in Italy, a large-scale investigation was conducted into urban sewage over nine years (2011-2019), collecting 1374 sewage samples from 48 wastewater treatment plants located in all the 20 regions of Italy. Broadly reactive primers targeting the ORF1 and ORF2 regions were used for the detection and typing of HEV, followed by Sanger and next generation sequencing (NGS). Real-time RT-qPCR was also used to attempt quantification of positive samples. HEV RNA detection occurred in 74 urban sewage samples (5.4%), with a statistically significant higher frequency (7.1%) in central Italy. Fifty-six samples were characterized as G3 strains and 18 as G1. While the detection of G3 strains occurred in all the surveillance period, G1 strains were mainly detected in 2011-2012, and never in 2017-2019. Typing was achieved in 2 samples (3f subtype). Viral concentrations in quantifiable samples ranged from 1.2 × 103 g.c./L to 2.8 × 104 g.c./L. Our results suggest the considerable circulation of the virus in the Italian population, despite a relatively small number of notified cases, a higher occurrence in central Italy, and a noteworthy predominance of G3 strains.
Collapse
Affiliation(s)
- Marcello Iaconelli
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| | - Giusy Bonanno Ferraro
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| | - Pamela Mancini
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| | - Elisabetta Suffredini
- Department of Food Safety, Nutrition and Veterinary Public Health, Istituto Superiore di Sanità, 00161 Rome, Italy; (E.S.); (D.D.M.)
| | - Carolina Veneri
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| | - Anna Rita Ciccaglione
- Department Infectious Diseases, Istituto Superiore di Sanità, 00161 Rome, Italy; (A.R.C.); (R.B.)
| | - Roberto Bruni
- Department Infectious Diseases, Istituto Superiore di Sanità, 00161 Rome, Italy; (A.R.C.); (R.B.)
| | - Simonetta Della Libera
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| | - Francesco Bignami
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| | - Massimo Brambilla
- Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria (CREA), Research Centre for Engineering and Agri Food Processing, 24047 Treviglio, BG, Italy;
| | - Dario De Medici
- Department of Food Safety, Nutrition and Veterinary Public Health, Istituto Superiore di Sanità, 00161 Rome, Italy; (E.S.); (D.D.M.)
| | | | - Pietro Schembri
- Regional Department for Health Activities and Epidemiological Observatory of the Sicilian Region, 90146 Palermo, Italy;
| | - Stefania D’Amato
- Ministry of Health, Directorate-General for Prevention, 00144 Rome, Italy;
| | - Giuseppina La Rosa
- Department of Environment and Health - Istituto Superiore di Sanità, 00161 Rome, Italy; (M.I.); (G.B.F.); (P.M.); (C.V.); (S.D.L.); (F.B.)
| |
Collapse
|
21
|
Fish I, Stenfeldt C, Palinski RM, Pauszek SJ, Arzt J. Into the Deep (Sequence) of the Foot-and-Mouth Disease Virus Gene Pool: Bottlenecks and Adaptation during Infection in Naïve and Vaccinated Cattle. Pathogens 2020; 9:pathogens9030208. [PMID: 32178297 PMCID: PMC7157448 DOI: 10.3390/pathogens9030208] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 03/06/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open
Abstract
Foot-and-mouth disease virus (FMDV) infects hosts as a population of closely related viruses referred to as a quasispecies. The behavior of this quasispecies has not been described in detail in natural host species. In this study, virus samples collected from vaccinated and non-vaccinated cattle up to 35 days post-experimental infection with FMDV A24-Cruzeiro were analyzed by deep-sequencing. Vaccination induced significant differences compared to viruses from non-vaccinated cattle in substitution rates, entropy, and evidence for adaptation. Genomic variation detected during early infection reflected the diversity inherited from the source virus (inoculum), whereas by 12 days post infection, dominant viruses were defined by newly acquired mutations. Mutations conferring recognized fitness gain occurred and were associated with selective sweeps. Persistent infections always included multiple FMDV subpopulations, suggesting distinct foci of infection within the nasopharyngeal mucosa. Subclinical infection in vaccinated cattle included very early bottlenecks associated with reduced diversity within virus populations. Viruses from both animal cohorts contained putative antigenic escape mutations. However, these mutations occurred during later stages of infection, at which time transmission is less likely to occur. This study improves upon previously published work by analyzing deep sequences of samples, allowing for detailed characterization of FMDV populations over time within multiple hosts.
Collapse
Affiliation(s)
- Ian Fish
- Foreign Animal Disease Research Unit, Plum Island Animal Disease Center, ARS, USDA, Orient, NY 11957, USA; (I.F.); (C.S.); (R.M.P.); (S.J.P.)
- Oak Ridge Institute for Science and Education, PIADC Research Participation Program, Oak Ridge, TN 37830, USA
| | - Carolina Stenfeldt
- Foreign Animal Disease Research Unit, Plum Island Animal Disease Center, ARS, USDA, Orient, NY 11957, USA; (I.F.); (C.S.); (R.M.P.); (S.J.P.)
- College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, USA
| | - Rachel M. Palinski
- Foreign Animal Disease Research Unit, Plum Island Animal Disease Center, ARS, USDA, Orient, NY 11957, USA; (I.F.); (C.S.); (R.M.P.); (S.J.P.)
| | - Steven J. Pauszek
- Foreign Animal Disease Research Unit, Plum Island Animal Disease Center, ARS, USDA, Orient, NY 11957, USA; (I.F.); (C.S.); (R.M.P.); (S.J.P.)
| | - Jonathan Arzt
- Foreign Animal Disease Research Unit, Plum Island Animal Disease Center, ARS, USDA, Orient, NY 11957, USA; (I.F.); (C.S.); (R.M.P.); (S.J.P.)
- Correspondence:
| |
Collapse
|
22
|
Chu YH, Zhong W, Rehrauer W, Pavelec DM, Ong IM, Arjang D, Patel SS, Hu R. Clinicopathologic Characterization of Post-Renal Transplantation BK Polyomavirus-Associated Urothelial CarcinomaSingle Institutional Experience. Am J Clin Pathol 2020; 153:303-314. [PMID: 31628837 DOI: 10.1093/ajcp/aqz167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
OBJECTIVES To review rare cases of BK polyomavirus (BKPyV) associated urologic carcinomas in kidney transplant recipients at one institution and in the literature. METHODS We describe the clinicopathologic features of BKPyV-associated urologic carcinomas in a single-institution cohort. RESULTS Among 4,772 kidney recipients during 1994 to 2014, 26 (0.5%) and 26 (0.5%) developed posttransplantation urothelial carcinomas (UCs) and renal cell carcinomas (RCCs), respectively, as of 2017. Six (27%) UCs but none of the RCCs expressed large T antigen (TAg). TAg-expressing UCs were high grade with p16 and p53 overexpression (P < .05 compared to TAg-negative UCs). Tumor genome sequencing revealed BKPyV integration and a lack of pathogenic mutations in 50 cancer-relevant genes. Compared to TAg-negative UCs, TAg-expressing UCs more frequently presented at advanced stages (50% T3-T4) with lymph node involvement (50%) and higher UC-specific mortality (50%). CONCLUSIONS Post-renal transplantation BKPyV-associated UCs are aggressive and genetically distinct from most non-BKPyV-related UCs.
Collapse
Affiliation(s)
- Ying-Hsia Chu
- Department of Pathology and Laboratory Medicine, Madison
| | - Weixiong Zhong
- Department of Pathology and Laboratory Medicine, Madison
- Department of Pathology and Laboratory Medicine Service, William S. Middleton Memorial Veterans Hospital, Madison, WI
| | | | - Derek M Pavelec
- Department of Bioinformatics Resource Center, University of Wisconsin Biotechnology Center, Madison
- Department of Cancer Informatics Shared Resource, University of Wisconsin Carbone Cancer Center, Madison
| | - Irene M Ong
- Department of Bioinformatics Resource Center, University of Wisconsin Biotechnology Center, Madison
| | - Djamali Arjang
- Department of Medicine, University of Wisconsin Hospital and Clinics, Madison
| | - Sanjay S Patel
- Department of Pathology and Laboratory Medicine, Madison
| | - Rong Hu
- Department of Pathology and Laboratory Medicine, Madison
| |
Collapse
|
23
|
Eisfeldt J, Mårtensson G, Ameur A, Nilsson D, Lindstrand A. Discovery of Novel Sequences in 1,000 Swedish Genomes. Mol Biol Evol 2020; 37:18-30. [PMID: 31560401 PMCID: PMC6984370 DOI: 10.1093/molbev/msz176] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
Collapse
Affiliation(s)
- Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Gustaf Mårtensson
- Division of Nanobiotechnology, Department of Protein Science, Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Daniel Nilsson
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Science for Life Laboratory, Karolinska Institutet Science Park, Solna, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institute, Stockholm, Sweden.,Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
24
|
Methods and Tools for Plant Organelle Genome Sequencing, Assembly, and Downstream Analysis. Methods Mol Biol 2020; 2107:49-98. [PMID: 31893443 DOI: 10.1007/978-1-0716-0235-5_4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Organelles play an important role in a eukaryotic cell. Among them, the two organelles, chloroplast and mitochondria, are responsible for the critical function of photosynthesis and aerobic respiration. Organellar genomes are also very important for plant systematic studies. Here we have described the methods for isolation of the mitochondrial and plastid DNA and its subsequent sequencing with the help of NGS technology. We have also discussed in detail the various tools available for assembly, annotation, and visualization of the organelle genome sequence.
Collapse
|
25
|
Shipley MM, Renner DW, Pandey U, Ford B, Bloom DC, Grose C, Szpara ML. Personalized viral genomic investigation of herpes simplex virus 1 perinatal viremic transmission with dual fatality. Cold Spring Harb Mol Case Stud 2019; 5:mcs.a004382. [PMID: 31582464 PMCID: PMC6913147 DOI: 10.1101/mcs.a004382] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/04/2019] [Indexed: 11/25/2022] Open
Abstract
Here we present a personalized viral genomics approach to investigating a rare case of perinatal herpes simplex virus 1 (HSV-1) transmission that ended in death of both mother and neonate. We sought to determine whether the virus involved in this rare case had any unusual features that may have contributed to the dire patient outcome. A pregnant woman with negative HerpeSelect antibody test underwent cesarean section at 30 wk gestation and died the same day. The premature newborn died 5 d later. Both individuals were found postmortem to have positive blood HSV-1 PCR tests. Using oligonucleotide enrichment and deep sequencing, we determined that viral transmission from mother to infant was nearly perfect at the consensus genome level. At the virus population level, 77% of minor variants (MVs) in the mother's blood also appeared on the neonate's skin, of which more than half were disseminated into the neonate's blood. We also detected nonmaternal MVs that arose de novo in the neonate's viral populations. Of note, one de novo MV in the neonate's skin virus induced a nonsynonymous mutation in the UL6 protein, which is a component of the portal that allows DNA entry into new progeny capsids. This case suggests that perinatal viremic HSV-1 transmission includes the majority of genetic diversity from the maternal virus population and that new, nonsynonymous mutations can occur after relatively few rounds of replication. This report expands our understanding of viral transmission in humans and may lead to improved diagnostic strategies for neonatal HSV-1 acquisition.
Collapse
Affiliation(s)
- Mackenzie M Shipley
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Daniel W Renner
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Utsav Pandey
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Bradley Ford
- Department of Pathology, University of Iowa, Iowa City, Iowa 52242, USA
| | - David C Bloom
- Department of Molecular Genetics and Microbiology, University of Florida College of Medicine, Gainesville, Florida 32610, USA
| | - Charles Grose
- Division of Infectious Disease/Virology, University of Iowa, Iowa City, Iowa 52242, USA
| | - Moriah L Szpara
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.,Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
26
|
Li F, Zhao X, Li M, He K, Huang C, Zhou Y, Li Z, Walters JR. Insect genomes: progress and challenges. INSECT MOLECULAR BIOLOGY 2019; 28:739-758. [PMID: 31120160 DOI: 10.1111/imb.12599] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 03/22/2019] [Accepted: 05/14/2019] [Indexed: 05/24/2023]
Abstract
In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics.
Collapse
Affiliation(s)
- F Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - X Zhao
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - M Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - K He
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - C Huang
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - Y Zhou
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - Z Li
- Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China
| | - J R Walters
- Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| |
Collapse
|
27
|
Ritch E, Fu SYF, Herberts C, Wang G, Warner EW, Schönlau E, Taavitsainen S, Murtha AJ, Vandekerkhove G, Beja K, Loktionova Y, Khalaf D, Fazli L, Kushnir I, Ferrario C, Hotte S, Annala M, Chi KN, Wyatt AW. Identification of Hypermutation and Defective Mismatch Repair in ctDNA from Metastatic Prostate Cancer. Clin Cancer Res 2019; 26:1114-1125. [PMID: 31744831 DOI: 10.1158/1078-0432.ccr-19-1623] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 10/10/2019] [Accepted: 11/15/2019] [Indexed: 12/14/2022]
Abstract
PURPOSE DNA mismatch repair defects (MMRd) and tumor hypermutation are rare and under-characterized in metastatic prostate cancer (mPC). Furthermore, because hypermutated MMRd prostate cancers can respond to immune checkpoint inhibitors, there is an urgent need for practical detection tools. EXPERIMENTAL DESIGN We analyzed plasma cell-free DNA-targeted sequencing data from 433 patients with mPC with circulating tumor DNA (ctDNA) purity ≥2%. Samples with somatic hypermutation were subjected to 185 × whole-exome sequencing and capture of mismatch repair gene introns. Archival tissue was analyzed with targeted sequencing and IHC. RESULTS Sixteen patients (3.7%) had somatic hypermutation with MMRd etiology, evidenced by deleterious alterations in MSH2, MSH6, or MLH1, microsatellite instability, and characteristic trinucleotide signatures. ctDNA was concordant with mismatch repair protein IHC and DNA sequencing of tumor tissue. Tumor suppressors such as PTEN, RB1, and TP53 were inactivated by mutation rather than copy-number loss. Hotspot mutations in oncogenes such as AKT1, PIK3CA, and CTNNB1 were common, and the androgen receptor (AR)-ligand binding domain was mutated in 9 of 16 patients. We observed high intrapatient clonal diversity, evidenced by subclonal driver mutations and shifts in mutation allele frequency over time. Patients with hypermutation and MMRd etiology in ctDNA had a poor response to AR inhibition and inferior survival compared with a control cohort. CONCLUSIONS Hypermutated MMRd mPC is associated with oncogene activation and subclonal diversity, which may contribute to a clinically aggressive disposition in selected patients. In patients with detectable ctDNA, cell-free DNA sequencing is a practical tool to prioritize this subtype for immunotherapy.See related commentary by Schweizer and Yu, p. 981.
Collapse
Affiliation(s)
- Elie Ritch
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Simon Y F Fu
- Department of Medical Oncology, BC Cancer, British Columbia, Canada
| | - Cameron Herberts
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Gang Wang
- Department of Medical Oncology, BC Cancer, British Columbia, Canada
| | - Evan W Warner
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Elena Schönlau
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Sinja Taavitsainen
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada.,Prostate Cancer Research Center, Faculty of Medicine and Life Sciences and BioMediTech Institute, University of Tampere, Tampere, Finland
| | - Andrew J Murtha
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Gillian Vandekerkhove
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Kevin Beja
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Yulia Loktionova
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Daniel Khalaf
- Department of Medical Oncology, BC Cancer, British Columbia, Canada
| | - Ladan Fazli
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada
| | - Igal Kushnir
- The Ottawa Hospital Cancer Centre, University of Ottawa, Ottawa, Ontario, Canada.,Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | | - Matti Annala
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada.,Prostate Cancer Research Center, Faculty of Medicine and Life Sciences and BioMediTech Institute, University of Tampere, Tampere, Finland
| | - Kim N Chi
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada. .,Department of Medical Oncology, BC Cancer, British Columbia, Canada
| | - Alexander W Wyatt
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, British Columbia, Canada.
| |
Collapse
|
28
|
Luo J, Lyu M, Chen R, Zhang X, Luo H, Yan C. SLR: a scaffolding algorithm based on long reads and contig classification. BMC Bioinformatics 2019; 20:539. [PMID: 31666010 PMCID: PMC6820941 DOI: 10.1186/s12859-019-3114-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 09/23/2019] [Indexed: 11/10/2022] Open
Abstract
Background Scaffolding is an important step in genome assembly that orders and orients the contigs produced by assemblers. However, repetitive regions in contigs usually prevent scaffolding from producing accurate results. How to solve the problem of repetitive regions has received a great deal of attention. In the past few years, long reads sequenced by third-generation sequencing technologies (Pacific Biosciences and Oxford Nanopore) have been demonstrated to be useful for sequencing repetitive regions in genomes. Although some stand-alone scaffolding algorithms based on long reads have been presented, scaffolding still requires a new strategy to take full advantage of the characteristics of long reads. Results Here, we present a new scaffolding algorithm based on long reads and contig classification (SLR). Through the alignment information of long reads and contigs, SLR classifies the contigs into unique contigs and ambiguous contigs for addressing the problem of repetitive regions. Next, SLR uses only unique contigs to produce draft scaffolds. Then, SLR inserts the ambiguous contigs into the draft scaffolds and produces the final scaffolds. We compare SLR to three popular scaffolding tools by using long read datasets sequenced with Pacific Biosciences and Oxford Nanopore technologies. The experimental results show that SLR can produce better results in terms of accuracy and completeness. The open-source code of SLR is available at https://github.com/luojunwei/SLR. Conclusion In this paper, we describes SLR, which is designed to scaffold contigs using long reads. We conclude that SLR can improve the completeness of genome assembly.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China.
| | - Mengna Lyu
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Ranran Chen
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Xiaohong Zhang
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| |
Collapse
|
29
|
Goldstein LD, Chen YJJ, Wu J, Chaudhuri S, Hsiao YC, Schneider K, Hoi KH, Lin Z, Guerrero S, Jaiswal BS, Stinson J, Antony A, Pahuja KB, Seshasayee D, Modrusan Z, Hötzel I, Seshagiri S. Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun Biol 2019; 2:304. [PMID: 31428692 PMCID: PMC6689056 DOI: 10.1038/s42003-019-0551-y] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 07/15/2019] [Indexed: 01/24/2023] Open
Abstract
Obtaining full-length antibody heavy- and light-chain variable regions from individual B cells at scale remains a challenging problem. Here we use high-throughput single-cell B-cell receptor sequencing (scBCR-seq) to obtain accurately paired full-length variable regions in a massively parallel fashion. We sequenced more than 250,000 B cells from rat, mouse and human repertoires to characterize their lineages and expansion. In addition, we immunized rats with chicken ovalbumin and profiled antigen-reactive B cells from lymph nodes of immunized animals. The scBCR-seq data recovered 81% (n = 56/69) of B-cell lineages identified from hybridomas generated from the same set of B cells subjected to scBCR-seq. Importantly, scBCR-seq identified an additional 710 candidate lineages not recovered as hybridomas. We synthesized, expressed and tested 93 clones from the identified lineages and found that 99% (n = 92/93) of the clones were antigen-reactive. Our results establish scBCR-seq as a powerful tool for antibody discovery.
Collapse
Affiliation(s)
- Leonard D. Goldstein
- Molecular Biology, Genentech, South San Francisco, CA 94080 USA
- Bioinformatics & Computational Biology, Genentech, South San Francisco, CA 94080 USA
| | | | - Jia Wu
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | | | - Yi-Chun Hsiao
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | - Kellen Schneider
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | - Kam Hon Hoi
- Bioinformatics & Computational Biology, Genentech, South San Francisco, CA 94080 USA
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | - Zhonghua Lin
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | - Steve Guerrero
- Bioinformatics & Computational Biology, Genentech, South San Francisco, CA 94080 USA
| | | | - Jeremy Stinson
- Molecular Biology, Genentech, South San Francisco, CA 94080 USA
| | - Aju Antony
- Department of Molecular Biology, SciGenom Labs, Cochin, Kerala 682037 India
| | | | - Dhaya Seshasayee
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | - Zora Modrusan
- Molecular Biology, Genentech, South San Francisco, CA 94080 USA
| | - Isidro Hötzel
- Antibody Engineering, Genentech, South San Francisco, CA 94080 USA
| | - Somasekar Seshagiri
- Molecular Biology, Genentech, South San Francisco, CA 94080 USA
- Present Address: SciGenom Research Foundation, Bangalore, 560099 India
| |
Collapse
|
30
|
James BT, Luczak BB, Girgis HZ. MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Res 2019; 46:e83. [PMID: 29718317 PMCID: PMC6101578 DOI: 10.1093/nar/gky315] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 04/13/2018] [Indexed: 11/13/2022] Open
Abstract
Sequence clustering is a fundamental step in analyzing DNA sequences. Widely-used software tools for sequence clustering utilize greedy approaches that are not guaranteed to produce the best results. These tools are sensitive to one parameter that determines the similarity among sequences in a cluster. Often times, a biologist may not know the exact sequence similarity. Therefore, clusters produced by these tools do not likely match the real clusters comprising the data if the provided parameter is inaccurate. To overcome this limitation, we adapted the mean shift algorithm, an unsupervised machine-learning algorithm, which has been used successfully thousands of times in fields such as image processing and computer vision. The theory behind the mean shift algorithm, unlike the greedy approaches, guarantees convergence to the modes, e.g. cluster centers. Here we describe the first application of the mean shift algorithm to clustering DNA sequences. MeShClust is one of few applications of the mean shift algorithm in bioinformatics. Further, we applied supervised machine learning to predict the identity score produced by global alignment using alignment-free methods. We demonstrate MeShClust's ability to cluster DNA sequences with high accuracy even when the sequence similarity parameter provided by the user is not very accurate.
Collapse
Affiliation(s)
- Benjamin T James
- Bioinformatics Toolsmith Laboratory, Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA.,Mathematics Department, University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA
| | - Brian B Luczak
- Bioinformatics Toolsmith Laboratory, Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA.,Mathematics Department, University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA
| | - Hani Z Girgis
- Bioinformatics Toolsmith Laboratory, Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, OK 74104, USA
| |
Collapse
|
31
|
Wang A, Wang Z, Li Z, Li LM. BAUM: improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach. Bioinformatics 2019; 34:2019-2028. [PMID: 29346504 DOI: 10.1093/bioinformatics/bty020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 01/12/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Results Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. Availability and implementation http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anqi Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhanyu Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zheng Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Lei M Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
32
|
Barsakis K, Babrzadeh F, Chi A, Mallempati K, Pickle W, Mindrinos M, Fernández-Viña MA. Complete nucleotide sequence characterization of DRB5 alleles reveals a homogeneous allele group that is distinct from other DRB genes. Hum Immunol 2019; 80:437-448. [PMID: 30954494 PMCID: PMC6622178 DOI: 10.1016/j.humimm.2019.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 03/23/2019] [Accepted: 04/01/2019] [Indexed: 01/28/2023]
Abstract
Next Generation Sequencing allows for testing and typing of entire genes of the HLA region. A better and comprehensive sequence assessment can be achieved by the inclusion of full gene sequences of all the common alleles at a given locus. The common alleles of DRB5 are under-characterized with the full exon-intron sequence of two alleles available. In the present study the DRB5 genes from 18 subjects alleles were cloned and sequenced; haplotype analysis showed that 17 of them had a single copy of DRB5 and one consanguineous subject was homozygous at all HLA loci. Methodological approaches including robust and efficient long-range PCR amplification, molecular cloning, nucleotide sequencing and de novo sequence assembly were combined to characterize DRB5 alleles. DRB5 sequences covering from 5'UTR to the end of intron 5 were obtained for DRB5*01:01, 01:02 and 02:02; partial coverage including a segment spanning exon 2 to exon 6 was obtained for DRB5*01:03, 01:08N and 02:03. Phylogenetic analysis of the generated sequences showed that the DRB5 alleles group together and have distinctive differences with other DRB loci. Novel intron variants of DRB5*01:01:01, 01:02 and 02:02 were identified. The newly characterized DRB5 intron variants of each DRB5 allele were found in subjects harboring distinct associations with alleles of DRB1, B and/or ethnicity. The new information provided by this study provides reference sequences for HLA typing methodologies. Extending sequence coverage may lead to identify the disease susceptibility factors of DRB5 containing haplotypes while the unexpected intron variations may shed light on understanding of the evolution of the DRB region.
Collapse
Affiliation(s)
- Konstantinos Barsakis
- Stanford Blood Center, Stanford University School of Medicine, Palo Alto, CA 94304, USA; Department of Biology, University of Crete, Heraklion, Crete 71003, Greece
| | - Farbod Babrzadeh
- Stanford Genome Technology Center, Stanford University School of Medicine, Palo Alto, CA 94304, USA
| | - Anjo Chi
- Stanford Genome Technology Center, Stanford University School of Medicine, Palo Alto, CA 94304, USA
| | - Kalyan Mallempati
- Stanford Blood Center, Stanford University School of Medicine, Palo Alto, CA 94304, USA
| | - William Pickle
- Stanford Blood Center, Stanford University School of Medicine, Palo Alto, CA 94304, USA
| | - Michael Mindrinos
- Stanford Genome Technology Center, Stanford University School of Medicine, Palo Alto, CA 94304, USA
| | | |
Collapse
|
33
|
Garcia LE, Zubko MK, Zubko EI, Sanchez-Puerta MV. Elucidating genomic patterns and recombination events in plant cybrid mitochondria. PLANT MOLECULAR BIOLOGY 2019; 100:433-450. [PMID: 30968307 DOI: 10.1007/s11103-019-00869-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 04/01/2019] [Indexed: 05/17/2023]
Abstract
KEY MESSAGE Cybrid plant mitochondria undergo homologous recombination, mainly BIR, keep a single allele for each gene, and maintain exclusive sequences of each parent and a single copy of the homologous regions. The maintenance of a dynamic equilibrium between the mitochondrial and nuclear genomes requires continuous communication and a high level of compatibility between them, so that alterations in one genetic compartment need adjustments in the other. The co-evolution of nuclear and mitochondrial genomes has been poorly studied, even though the consequences and effects of this interaction are highly relevant for human health, as well as for crop improvement programs and for genetic engineering. The mitochondria of plants represent an excellent system to understand the mechanisms of genomic rearrangements, chimeric gene formation, incompatibility between nucleus and cytoplasm, and horizontal gene transfer. We carried out detailed analyses of the mtDNA of a repeated cybrid between the solanaceae Nicotiana tabacum and Hyoscyamus niger. The mtDNA of the cybrid was intermediate between the size of the parental mtDNAs and the sum of them. Noticeably, most of the homologous sequences inherited from both parents were lost. In contrast, the majority of the sequences exclusive of a single parent were maintained. The mitochondrial gene content included a majority of N. tabacum derived genes, but also chimeric, two-parent derived, and H. niger-derived genes in a tobacco nuclear background. Any of these alterations in the gene content could be the cause of CMS in the cybrid. The parental mtDNAs interacted through 28 homologous recombination events and a single case of illegitimate recombination. Three main homologous recombination mechanisms were recognized in the cybrid mitochondria. Break induced replication (BIR) pathway was the most frequent. We propose that BIR could be one of the mechanisms responsible for the loss of the majority of the repeated regions derived from H. niger.
Collapse
Affiliation(s)
- Laura E Garcia
- Facultad de Ciencias Agrarias, IBAM, Universidad Nacional de Cuyo, CONICET, Almirante Brown 500, M5528AHB, Chacras de Coria, Argentina.
- Facultad de Ciencias Exactas y Naturales, Universidad Nacional de Cuyo, 5500, Mendoza, Argentina.
| | - Mikhajlo K Zubko
- Centre for Bioscience, Faculty of Science and Engineering, Manchester Metropolitan University, Manchester, M1 5GD, UK
| | - Elena I Zubko
- Centre for Bioscience, Faculty of Science and Engineering, Manchester Metropolitan University, Manchester, M1 5GD, UK
| | - M Virginia Sanchez-Puerta
- Facultad de Ciencias Agrarias, IBAM, Universidad Nacional de Cuyo, CONICET, Almirante Brown 500, M5528AHB, Chacras de Coria, Argentina
- Facultad de Ciencias Exactas y Naturales, Universidad Nacional de Cuyo, 5500, Mendoza, Argentina
| |
Collapse
|
34
|
GAAP: A Genome Assembly + Annotation Pipeline. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4767354. [PMID: 31346518 PMCID: PMC6617929 DOI: 10.1155/2019/4767354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/20/2019] [Accepted: 05/26/2019] [Indexed: 12/24/2022]
Abstract
Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.
Collapse
|
35
|
The complete organelle genomes of Physochlaina orientalis: Insights into short sequence repeats across seed plant mitochondrial genomes. Mol Phylogenet Evol 2019; 137:274-284. [PMID: 31112782 DOI: 10.1016/j.ympev.2019.05.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 05/14/2019] [Accepted: 05/17/2019] [Indexed: 11/24/2022]
Abstract
Short repeats (SR) play an important role in shaping seed plant mitochondrial genomes (mtDNAs). However, their origin, distribution, and relationships across the different plant lineages remain unresolved. We focus on the angiosperm family Solanaceae that shows great variation in repeat content and extend the study to a wide diversity of seed plants. We determined the complete nucleotide sequences of the organellar genomes of the medicinal plant Physochlaina orientalis (Solanaceae), member of the tribe Hyoscyameae. To understand the evolution of the P. orientalis mtDNA we made comparisons with those of five other Solanaceae. P. orientalis mtDNA presents the largest mitogenome (∼685 kb in size) among the Solanaceae and has an unprecedented 8-copy repeat family of ∼8.2 kb in length and a great number of SR arranged in tandem-like structures. We found that the SR in the Solanaceae share a common origin, but these only expanded in members of the tribe Hyoscyameae. We discuss a mechanism that could explain SR formation and expansion in P. orientalis and Hyoscyamus niger. Finally, the great increase in plant mitochondrial data allowed us to systematically extend our repeat analysis to a total of 136 seed plants to characterize and analyze for the first time families of SR among seed plant mtDNAs.
Collapse
|
36
|
Kwon D, Lee J, Kim J. GMASS: a novel measure for genome assembly structural similarity. BMC Bioinformatics 2019; 20:147. [PMID: 30885117 PMCID: PMC6423833 DOI: 10.1186/s12859-019-2710-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/03/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS reads, but their output is very different because of algorithmic differences. However, there are not properly structured measures to show the similarity or difference in assemblies. RESULTS We developed a new measure, called the GMASS score, for comparing two genome assemblies in terms of their structure. The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies. The new measure was able to show structural similarity between assemblies when evaluated by simulated assembly datasets. The application of the GMASS score to compare assemblies in recently published benchmark datasets showed the divergent performance of current assemblers as well as its ability to compare assemblies. CONCLUSION The GMASS score is a novel measure for representing structural similarity between two assemblies. It will contribute to the understanding of assembly output and developing de novo assemblers.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| |
Collapse
|
37
|
Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res 2018; 45:e18. [PMID: 28204566 PMCID: PMC5389512 DOI: 10.1093/nar/gkw955] [Citation(s) in RCA: 1220] [Impact Index Per Article: 203.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 10/01/2016] [Accepted: 10/11/2016] [Indexed: 11/26/2022] Open
Abstract
The evolution in next-generation sequencing (NGS) technology has led to the development of many different assembly algorithms, but few of them focus on assembling the organelle genomes. These genomes are used in phylogenetic studies, food identification and are the most deposited eukaryotic genomes in GenBank. Producing organelle genome assembly from whole genome sequencing (WGS) data would be the most accurate and least laborious approach, but a tool specifically designed for this task is lacking. We developed a seed-and-extend algorithm that assembles organelle genomes from whole genome sequencing (WGS) data, starting from a related or distant single seed sequence. The algorithm has been tested on several new (Gonioctena intermedia and Avicennia marina) and public (Arabidopsis thaliana and Oryza sativa) whole genome Illumina data sets where it outperforms known assemblers in assembly accuracy and coverage. In our benchmark, NOVOPlasty assembled all tested circular genomes in less than 30 min with a maximum memory requirement of 16 GB and an accuracy over 99.99%. In conclusion, NOVOPlasty is the sole de novo assembler that provides a fast and straightforward extraction of the extranuclear genomes from WGS data in one circular high quality contig. The software is open source and can be downloaded at https://github.com/ndierckx/NOVOPlasty.
Collapse
Affiliation(s)
- Nicolas Dierckxsens
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles and Vrije Universiteit Brussel, Triomflaan CP 263, 1050 Brussels, Belgium
| | | | | |
Collapse
|
38
|
Yoon S, Kim D, Kang K, Park WJ. TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics 2018; 19:653. [PMID: 30180798 PMCID: PMC6123912 DOI: 10.1186/s12864-018-5034-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 08/23/2018] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The challenges when developing a good de novo transcriptome assembler include how to deal with read errors and sequence repeats. Almost all de novo assemblers utilize a de Bruijn graph, with which complexity grows linearly with data size while suffering from errors and repeats. Although one can correct the errors by inspecting the topological structure of the graph, this is not an easy task when there are too many branches. Two research directions are to improve either the graph reliability or the path search precision, and in this study, we focused on the former. RESULTS We present TraRECo, a greedy approach to de novo assembly employing error-aware graph construction. In the proposed approach, we built contigs by direct read alignment within a distance margin and performed a junction search to construct splicing graphs. While doing so, a contig of length l was represented by a 4 × l matrix (called a consensus matrix), in which each element was the base count of the aligned reads so far. A representative sequence was obtained by taking the majority in each column of the consensus matrix to be used for further read alignment. Once the splicing graphs had been obtained, we used IsoLasso to find paths with a noticeable read depth. The experiments using real and simulated reads show that the method provided considerable improvement in sensitivity and moderately better performance when comparing sensitivity and precision. This was achieved by the error-aware graph construction using the consensus matrix, with which the reads having errors were made usable for the graph construction (otherwise, they might have been eventually discarded). This improved the quality of the coverage depth information used in the subsequent path search step and finally the reliability of the graph. CONCLUSIONS De novo assembly is mainly used to explore undiscovered isoforms and must be able to represent as many reads as possible in an efficient way. In this sense, TraRECo provides us with a potential alternative for improving graph reliability even though the computational burden is much higher than the single k-mer in the de Bruijn graph approach.
Collapse
Affiliation(s)
- Seokhyun Yoon
- Department of Electronics Eng., College of Engineering, Dankook University, Yongin-si, Korea
| | - Daeseung Kim
- Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea
| | - Keunsoo Kang
- Department of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea.
| | - Woong June Park
- Department of Molecular Biology, College of Natural Sciences, Dankook University, Cheonan-si, Korea
| |
Collapse
|
39
|
Ryšavý P, Železný F. Estimating sequence similarity from read sets for clustering next-generation sequencing data. Data Min Knowl Discov 2018. [DOI: 10.1007/s10618-018-0584-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
40
|
Chen Q, Lan C, Zhao L, Wang J, Chen B, Chen YPP. Recent advances in sequence assembly: principles and applications. Brief Funct Genomics 2018; 16:361-378. [PMID: 28453648 DOI: 10.1093/bfgp/elx006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.
Collapse
|
41
|
Bengtsson-Palme J, Larsson DGJ, Kristiansson E. Using metagenomics to investigate human and environmental resistomes. J Antimicrob Chemother 2018; 72:2690-2703. [PMID: 28673041 DOI: 10.1093/jac/dkx199] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Antibiotic resistance is a global health concern declared by the WHO as one of the largest threats to modern healthcare. In recent years, metagenomic DNA sequencing has started to be applied as a tool to study antibiotic resistance in different environments, including the human microbiota. However, a multitude of methods exist for metagenomic data analysis, and not all methods are suitable for the investigation of resistance genes, particularly if the desired outcome is an assessment of risks to human health. In this review, we outline the current state of methods for sequence handling, mapping to databases of resistance genes, statistical analysis and metagenomic assembly. In addition, we provide an overview of important considerations related to the analysis of resistance genes, and recommend some of the currently used tools and methods that are best equipped to inform research and clinical practice related to antibiotic resistance.
Collapse
Affiliation(s)
- Johan Bengtsson-Palme
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10, SE-41346, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden
| | - D G Joakim Larsson
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10, SE-41346, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden
| | - Erik Kristiansson
- Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden.,Department of Mathematical Sciences, Chalmers University of Technology, SE-41296, Gothenburg, Sweden
| |
Collapse
|
42
|
Khan AR, Pervez MT, Babar ME, Naveed N, Shoaib M. A Comprehensive Study of De Novo Genome Assemblers: Current Challenges and Future Prospective. Evol Bioinform Online 2018; 14:1176934318758650. [PMID: 29511353 PMCID: PMC5826002 DOI: 10.1177/1176934318758650] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 01/19/2018] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Current advancements in next-generation sequencing technology have made possible to sequence whole genome but assembling a large number of short sequence reads is still a big challenge. In this article, we present the comparative study of seven assemblers, namely, ABySS, Velvet, Edena, SGA, Ray, SSAKE, and Perga, using prokaryotic and eukaryotic paired-end as well as single-end data sets from Illumina platform. RESULTS Results showed that in case of single-end data sets, Velvet and ABySS outperformed in all the seven assemblers with comparatively low assembling time and high genome fraction. Velvet consumed the least amount of memory than any other assembler. In case of paired-end data sets, Velvet consumed least amount of time and produced high genome fraction after ABySS and Ray. In terms of low memory usage, SGA and Edena outperformed in all the assemblers. Ray also showed good genome fraction; however, extremely high assembling time consumed by the Ray might make it prohibitively slow on larger data sets of single and paired-end data. CONCLUSIONS Our comparison study will provide assistance to the scientists for selecting the suitable assembler according to their data sets and will also assist the developers to upgrade or develop a new assembler for de novo assembling.
Collapse
Affiliation(s)
- Abdul Rafay Khan
- Department of Bioinformatics and Computational Biology, Virtual University of Pakistan, Lahore, Pakistan
| | - Muhammad Tariq Pervez
- Department of Bioinformatics and Computational Biology, Virtual University of Pakistan, Lahore, Pakistan
| | | | - Nasir Naveed
- Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan
| | - Muhammad Shoaib
- Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan
| |
Collapse
|
43
|
Abstract
Structural variations (SVs) are an important type of genomic variants and always play a critical role for cancer development and progression. In the cancer genomics era, detecting structural variations from short sequencing data is still challenging. We developed a novel algorithm, novoBreak (Chong et al. Nat Methods 14:65-67, 2017), which achieved the highest balanced accuracy (mean of sensitivity and precision) in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling Challenge. Here we describe detailed instructions of applying novoBreak ( https://github.com/czc/nb_distribution ), an open-source software, for somatic SVs detection. We also briefly introduce how to detect germline SVs using novoBreak pipeline and how to use the Workflow ( https://cgc.sbgenomics.com/public/apps#ZCHONG/novobreak-commit/novobreak-analysis/ ) of novoBreak on the Seven Bridges Cancer Genomics Cloud.
Collapse
Affiliation(s)
- Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
44
|
A stochastic de novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.07.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
45
|
Abstract
Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.
Collapse
Affiliation(s)
- Xiuhua Si
- a Department of Computer Science & Technology , Heilongjiang University , Harbin , China
| | - Qian Wang
- b Shandong Aerospace Institute of Electronic Technology , Yantai , China
| | - Lei Zhang
- a Department of Computer Science & Technology , Heilongjiang University , Harbin , China
| | - Ruo Wu
- a Department of Computer Science & Technology , Heilongjiang University , Harbin , China
| | - Jiquan Ma
- a Department of Computer Science & Technology , Heilongjiang University , Harbin , China
| |
Collapse
|
46
|
Li M, Wu B, Yan X, Luo J, Pan Y, Wu FX, Wang J. PECC: Correcting contigs based on paired-end read distribution. Comput Biol Chem 2017; 69:178-184. [DOI: 10.1016/j.compbiolchem.2017.03.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 03/27/2017] [Indexed: 11/26/2022]
|
47
|
Li M, Liao Z, He Y, Wang J, Luo J, Pan Y. ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:916-925. [PMID: 27076460 DOI: 10.1109/tcbb.2016.2550433] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The purpose of de novo assembly is to report more contiguous, complete, and less error prone contigs. Thanks to the advent of the next generation sequencing (NGS) technologies, the cost of producing high depth reads is reduced greatly. However, due to the disadvantages of NGS, de novo assembly has to face the difficulties brought by repeat regions, error rate, and low sequencing coverage in some regions. Although many de novo algorithms have been proposed to solve these problems, the de novo assembly still remains a challenge. In this article, we developed an iterative seed-extension algorithm for de novo assembly, called ISEA. To avoid the negative impact induced by error rate, ISEA utilizes reads overlap and paired-end information to correct error reads before assemblying. During extending seeds in a De Bruijn graph, ISEA uses an elaborately designed score function based on paired-end information and the distribution of insert size to solve the repeat region problem. By employing the distribution of insert size, the score function can also reduce the influence of error reads. In scaffolding, ISEA adopts a relaxed strategy to join contigs that were terminated for low coverage during the extension. The performance of ISEA was compared with six previous popular assemblers on four real datasets. The experimental results demonstrate that ISEA can effectively obtain longer and more accurate scaffolds.
Collapse
|
48
|
Kremer FS, McBride AJA, Pinto LDS. Approaches for in silico finishing of microbial genome sequences. Genet Mol Biol 2017; 40:553-576. [PMID: 28898352 PMCID: PMC5596377 DOI: 10.1590/1678-4685-gmb-2016-0230] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/13/2017] [Indexed: 12/15/2022] Open
Abstract
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Collapse
Affiliation(s)
- Frederico Schmitt Kremer
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Alan John Alexander McBride
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Luciano da Silva Pinto
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| |
Collapse
|
49
|
Petersen G, Cuenca A, Zervas A, Ross GT, Graham SW, Barrett CF, Davis JI, Seberg O. Mitochondrial genome evolution in Alismatales: Size reduction and extensive loss of ribosomal protein genes. PLoS One 2017; 12:e0177606. [PMID: 28545148 PMCID: PMC5435185 DOI: 10.1371/journal.pone.0177606] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 04/28/2017] [Indexed: 11/18/2022] Open
Abstract
The order Alismatales is a hotspot for evolution of plant mitochondrial genomes characterized by remarkable differences in genome size, substitution rates, RNA editing, retrotranscription, gene loss and intron loss. Here we have sequenced the complete mitogenomes of Zostera marina and Stratiotes aloides, which together with previously sequenced mitogenomes from Butomus and Spirodela, provide new evolutionary evidence of genome size reduction, gene loss and transfer to the nucleus. The Zostera mitogenome includes a large portion of DNA transferred from the plastome, yet it is the smallest known mitogenome from a non-parasitic plant. Using a broad sample of the Alismatales, the evolutionary history of ribosomal protein gene loss is analyzed. In Zostera almost all ribosomal protein genes are lost from the mitogenome, but only some can be found in the nucleus.
Collapse
Affiliation(s)
- Gitte Petersen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Argelia Cuenca
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Athanasios Zervas
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Gregory T. Ross
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
- UBC Botanical Garden & Centre for Plant Research, University of British Columbia, Vancouver, British Columbia, Canada
| | - Sean W. Graham
- Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada
- UBC Botanical Garden & Centre for Plant Research, University of British Columbia, Vancouver, British Columbia, Canada
| | - Craig F. Barrett
- L. H. Bailey Hortorium and Plant Biology Section, Cornell University, Ithaca, New York, United States of America
| | - Jerrold I. Davis
- L. H. Bailey Hortorium and Plant Biology Section, Cornell University, Ithaca, New York, United States of America
| | - Ole Seberg
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
50
|
Villela LCV, Alves AL, Varela ES, Yamagishi MEB, Giachetto PF, da Silva NMA, Ponzetto JM, Paiva SR, Caetano AR. Complete mitochondrial genome from South American catfish Pseudoplatystoma reticulatum (Eigenmann & Eigenmann) and its impact in Siluriformes phylogenetic tree. Genetica 2017; 145:51-66. [DOI: 10.1007/s10709-016-9945-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2016] [Accepted: 12/22/2016] [Indexed: 01/08/2023]
|