1
|
Fuhrmann L, Langer B, Topolsky I, Beerenwinkel N. VILOCA: sequencing quality-aware viral haplotype reconstruction and mutation calling for short-read and long-read data. NAR Genom Bioinform 2024; 6:lqae152. [PMID: 39633724 PMCID: PMC11616694 DOI: 10.1093/nargab/lqae152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 09/15/2024] [Accepted: 10/25/2024] [Indexed: 12/07/2024] Open
Abstract
RNA viruses exist as large heterogeneous populations within their host. The structure and diversity of virus populations affects disease progression and treatment outcomes. Next-generation sequencing allows detailed viral population analysis, but inferring diversity from error-prone reads is challenging. Here, we present VILOCA (VIral LOcal haplotype reconstruction and mutation CAlling for short and long read data), a method for mutation calling and reconstruction of local haplotypes from short- and long-read viral sequencing data. Local haplotypes refer to genomic regions that have approximately the length of the input reads. VILOCA recovers local haplotypes by using a Dirichlet process mixture model to cluster reads around their unobserved haplotypes and leveraging quality scores of the sequencing reads. We assessed the performance of VILOCA in terms of mutation calling and haplotype reconstruction accuracy on simulated and experimental Illumina, PacBio and Oxford Nanopore data. On simulated and experimental Illumina data, VILOCA performed better or similar to existing methods. On the simulated long-read data, VILOCA is able to recover on average [Formula: see text] of the ground truth mutations with perfect precision compared to only [Formula: see text] recall and [Formula: see text] precision of the second-best method. In summary, VILOCA provides significantly improved accuracy in mutation and haplotype calling, especially for long-read sequencing data, and therefore facilitates the comprehensive characterization of heterogeneous within-host viral populations.
Collapse
Affiliation(s)
- Lara Fuhrmann
- Department of Biosystems Science and Engineering, ETH Zurich, Klingelbergstrasse 48, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, Lausanne 1015, Switzerland
| | - Benjamin Langer
- Department of Biosystems Science and Engineering, ETH Zurich, Klingelbergstrasse 48, Basel 4056, Switzerland
| | - Ivan Topolsky
- Department of Biosystems Science and Engineering, ETH Zurich, Klingelbergstrasse 48, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, Lausanne 1015, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Klingelbergstrasse 48, Basel 4056, Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, Lausanne 1015, Switzerland
| |
Collapse
|
2
|
Jochheim A, Jochheim FA, Kolodyazhnaya A, Morice É, Steinegger M, Söding J. Strain-resolved de-novo metagenomic assembly of viral genomes and microbial 16S rRNAs. MICROBIOME 2024; 12:187. [PMID: 39354646 PMCID: PMC11443906 DOI: 10.1186/s40168-024-01904-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 08/07/2024] [Indexed: 10/03/2024]
Abstract
BACKGROUND Metagenomics is a powerful approach to study environmental and human-associated microbial communities and, in particular, the role of viruses in shaping them. Viral genomes are challenging to assemble from metagenomic samples due to their genomic diversity caused by high mutation rates. In the standard de Bruijn graph assemblers, this genomic diversity leads to complex k-mer assembly graphs with a plethora of loops and bulges that are challenging to resolve into strains or haplotypes because variants more than the k-mer size apart cannot be phased. In contrast, overlap assemblers can phase variants as long as they are covered by a single read. RESULTS Here, we present PenguiN, a software for strain resolved assembly of viral DNA and RNA genomes and bacterial 16S rRNA from shotgun metagenomics. Its exhaustive detection of all read overlaps in linear time combined with a Bayesian model to select strain-resolved extensions allow it to assemble longer viral contigs, less fragmented genomes, and more strains than existing assembly tools, on both real and simulated datasets. We show a 3-40-fold increase in complete viral genomes and a 6-fold increase in bacterial 16S rRNA genes. CONCLUSION PenguiN is the first overlap-based assembler for viral genome and 16S rRNA assembly from large and complex metagenomic datasets, which we hope will facilitate studying the key roles of viruses in microbial communities. Video Abstract.
Collapse
Affiliation(s)
- Annika Jochheim
- Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- International Max-Planck Research School for Genome Sciences, University of Göttingen, Göttingen, Germany
| | - Florian A Jochheim
- International Max-Planck Research School for Genome Sciences, University of Göttingen, Göttingen, Germany
- Dep. of Molecular Biology, Max-Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Alexandra Kolodyazhnaya
- Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Étienne Morice
- Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- International Max-Planck Research School for Genome Sciences, University of Göttingen, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea.
| | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, Göttingen, Germany.
- International Max-Planck Research School for Genome Sciences, University of Göttingen, Göttingen, Germany.
- Campus Institute Data Science (CIDAS), University of Göttingen, Göttingen, Germany.
| |
Collapse
|
3
|
da Silva AF, da Silva Neto AM, Aksenen C, Jeronimo P, Dezordi F, Almeida S, Costa H, Salvato R, Campos TD, Wallau G, of the Fiocruz Genomic Network OB. ViralFlow v1.0-a computational workflow for streamlining viral genomic surveillance. NAR Genom Bioinform 2024; 6:lqae056. [PMID: 38800829 PMCID: PMC11127631 DOI: 10.1093/nargab/lqae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/15/2024] [Accepted: 05/09/2024] [Indexed: 05/29/2024] Open
Abstract
ViralFlow v1.0 is a computational workflow developed for viral genomic surveillance. Several key changes turned ViralFlow into a general-purpose reference-based genome assembler for all viruses with an available reference genome. New virus-agnostic modules were implemented to further study nucleotide and amino acid mutations. ViralFlow v1.0 runs on a broad range of computational infrastructures, from laptop computers to high-performance computing (HPC) environments, and generates standard and well-formatted outputs suited for both public health reporting and scientific problem-solving. ViralFlow v1.0 is available at: https://viralflow.github.io/index-en.html.
Collapse
Affiliation(s)
- Alexandre Freitas da Silva
- Departamento de Entomologia, Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | - Antonio Marinho da Silva Neto
- Data Analysis and Engineering, Genomic Surveillance Unit, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | | | | | - Filipe Zimmer Dezordi
- Departamento de Entomologia, Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | | | - Hudson Marques Paula Costa
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | - Richard Steiner Salvato
- Secretaria Estadual da Saúde do Rio Grande do Sul, Centro Estadual de Vigilância em Saúde, Laboratório Central de Saúde Pública, Porto Alegre, Rio Grande do Sul 90450-190, Brazil
| | - Tulio de Lima Campos
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | - Gabriel da Luz Wallau
- Departamento de Entomologia, Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Department of Arbovirology, Bernhard Nocht Institute for Tropical Medicine, WHO Collaborating Center for Arbovirus and Hemorrhagic Fever Reference and Research, National Reference Center for Tropical Infectious Diseases, Bernhard-Nocht-Strasse 74, D-20359 Hamburg, Germany
| | | |
Collapse
|
4
|
Liu X, Liu Y, Liu J, Zhang H, Shan C, Guo Y, Gong X, Cui M, Li X, Tang M. Correlation between the gut microbiome and neurodegenerative diseases: a review of metagenomics evidence. Neural Regen Res 2024; 19:833-845. [PMID: 37843219 PMCID: PMC10664138 DOI: 10.4103/1673-5374.382223] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/19/2023] [Accepted: 06/17/2023] [Indexed: 10/17/2023] Open
Abstract
A growing body of evidence suggests that the gut microbiota contributes to the development of neurodegenerative diseases via the microbiota-gut-brain axis. As a contributing factor, microbiota dysbiosis always occurs in pathological changes of neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis. High-throughput sequencing technology has helped to reveal that the bidirectional communication between the central nervous system and the enteric nervous system is facilitated by the microbiota's diverse microorganisms, and for both neuroimmune and neuroendocrine systems. Here, we summarize the bioinformatics analysis and wet-biology validation for the gut metagenomics in neurodegenerative diseases, with an emphasis on multi-omics studies and the gut virome. The pathogen-associated signaling biomarkers for identifying brain disorders and potential therapeutic targets are also elucidated. Finally, we discuss the role of diet, prebiotics, probiotics, postbiotics and exercise interventions in remodeling the microbiome and reducing the symptoms of neurodegenerative diseases.
Collapse
Affiliation(s)
- Xiaoyan Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yi Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
- Institute of Animal Husbandry, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu Province, China
| | - Junlin Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Hantao Zhang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Chaofan Shan
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yinglu Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Xun Gong
- Department of Rheumatology & Immunology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Mengmeng Cui
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Xiubin Li
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| |
Collapse
|
5
|
Fu P, Wu Y, Zhang Z, Qiu Y, Wang Y, Peng Y. VIGA: a one-stop tool for eukaryotic virus identification and genome assembly from next-generation-sequencing data. Brief Bioinform 2023; 25:bbad444. [PMID: 38048079 PMCID: PMC10753531 DOI: 10.1093/bib/bbad444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 10/26/2023] [Accepted: 11/11/2023] [Indexed: 12/05/2023] Open
Abstract
Identification of viruses and further assembly of viral genomes from the next-generation-sequencing data are essential steps in virome studies. This study presented a one-stop tool named VIGA (available at https://github.com/viralInformatics/VIGA) for eukaryotic virus identification and genome assembly from NGS data. It was composed of four modules, namely, identification, taxonomic annotation, assembly and novel virus discovery, which integrated several third-party tools such as BLAST, Trinity, MetaCompass and RagTag. Evaluation on multiple simulated and real virome datasets showed that VIGA assembled more complete virus genomes than its competitors on both the metatranscriptomic and metagenomic data and performed well in assembling virus genomes at the strain level. Finally, VIGA was used to investigate the virome in metatranscriptomic data from the Human Microbiome Project and revealed different composition and positive rate of viromes in diseases of prediabetes, Crohn's disease and ulcerative colitis. Overall, VIGA would help much in identification and characterization of viromes, especially the known viruses, in future studies.
Collapse
Affiliation(s)
- Ping Fu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Yifan Wu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Zhiyuan Zhang
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Ye Qiu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Yirong Wang
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| | - Yousong Peng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha 410082, China
| |
Collapse
|
6
|
Shukla N, Srivastava N, Gupta R, Srivastava P, Narayan J. COVID Variants, Villain and Victory: A Bioinformatics Perspective. Microorganisms 2023; 11:2039. [PMID: 37630599 PMCID: PMC10459809 DOI: 10.3390/microorganisms11082039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/11/2023] [Accepted: 07/11/2023] [Indexed: 08/27/2023] Open
Abstract
The SARS-CoV-2 virus, a novel member of the Coronaviridae family, is responsible for the viral infection known as Coronavirus Disease 2019 (COVID-19). In response to the urgent and critical need for rapid detection, diagnosis, analysis, interpretation, and treatment of COVID-19, a wide variety of bioinformatics tools have been developed. Given the virulence of SARS-CoV-2, it is crucial to explore the pathophysiology of the virus. We intend to examine how bioinformatics, in conjunction with next-generation sequencing techniques, can be leveraged to improve current diagnostic tools and streamline vaccine development for emerging SARS-CoV-2 variants. We also emphasize how bioinformatics, in general, can contribute to critical areas of biomedicine, including clinical diagnostics, SARS-CoV-2 genomic surveillance and its evolution, identification of potential drug targets, and development of therapeutic strategies. Currently, state-of-the-art bioinformatics tools have helped overcome technical obstacles with respect to genomic surveillance and have assisted in rapid detection, diagnosis, and delivering precise treatment to individuals on time.
Collapse
Affiliation(s)
- Nityendra Shukla
- CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi 110007, India; (N.S.); (R.G.)
| | - Neha Srivastava
- Amity Institute of Biotechnology, Amity University, Uttar Pradesh, Lucknow Campus, Lucknow 226010, India; (N.S.); (P.S.)
| | - Rohit Gupta
- CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi 110007, India; (N.S.); (R.G.)
| | - Prachi Srivastava
- Amity Institute of Biotechnology, Amity University, Uttar Pradesh, Lucknow Campus, Lucknow 226010, India; (N.S.); (P.S.)
| | - Jitendra Narayan
- CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi 110007, India; (N.S.); (R.G.)
| |
Collapse
|
7
|
Cai X, Lan T, Ping P, Oliver B, Li J. Intra-Host Co-Existing Strains of SARS-CoV-2 Reference Genome Uncovered by Exhaustive Computational Search. Viruses 2023; 15:v15051065. [PMID: 37243151 DOI: 10.3390/v15051065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 04/24/2023] [Accepted: 04/24/2023] [Indexed: 05/28/2023] Open
Abstract
The COVID-19 pandemic caused by SARS-CoV-2 has had a severe impact on people worldwide. The reference genome of the virus has been widely used as a template for designing mRNA vaccines to combat the disease. In this study, we present a computational method aimed at identifying co-existing intra-host strains of the virus from RNA-sequencing data of short reads that were used to assemble the original reference genome. Our method consisted of five key steps: extraction of relevant reads, error correction for the reads, identification of within-host diversity, phylogenetic study, and protein binding affinity analysis. Our study revealed that multiple strains of SARS-CoV-2 can coexist in both the viral sample used to produce the reference sequence and a wastewater sample from California. Additionally, our workflow demonstrated its capability to identify within-host diversity in foot-and-mouth disease virus (FMDV). Through our research, we were able to shed light on the binding affinity and phylogenetic relationships of these strains with the published SARS-CoV-2 reference genome, SARS-CoV, variants of concern (VOC) of SARS-CoV-2, and some closely related coronaviruses. These insights have important implications for future research efforts aimed at identifying within-host diversity, understanding the evolution and spread of these viruses, as well as the development of effective treatments and vaccines against them.
Collapse
Affiliation(s)
- Xinhui Cai
- Data Science Institute and School of Computer Science, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Tian Lan
- Data Science Institute and School of Computer Science, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Pengyao Ping
- Data Science Institute and School of Computer Science, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Brian Oliver
- School of Life Sciences, Faculty of Science, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Jinyan Li
- Data Science Institute and School of Computer Science, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China
| |
Collapse
|
8
|
Chen P, Sun Z, Wang J, Liu X, Bai Y, Chen J, Liu A, Qiao F, Chen Y, Yuan C, Sha J, Zhang J, Xu LQ, Li J. Portable nanopore-sequencing technology: Trends in development and applications. Front Microbiol 2023; 14:1043967. [PMID: 36819021 PMCID: PMC9929578 DOI: 10.3389/fmicb.2023.1043967] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 01/03/2023] [Indexed: 02/04/2023] Open
Abstract
Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.
Collapse
Affiliation(s)
- Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Zepeng Sun
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Jiawei Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xinlong Liu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yun Bai
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Jiang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Feng Qiao
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Chenyan Yuan
- Clinical Laboratory, Southeast University Zhongda Hospital, Nanjing, China
| | - Jingjie Sha
- School of Mechanical Engineering, Southeast University, Nanjing, China
| | - Jinghui Zhang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Li-Qun Xu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China,*Correspondence: Li-Qun Xu, ✉
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China,Jian Li, ✉
| |
Collapse
|
9
|
Appah A, Beelen CJ, Kirkby D, Dong W, Shahid A, Foley B, Mensah M, Ganu V, Puplampu P, Amoah LE, Nii-Trebi NI, Brumme CJ, Brumme ZL. Molecular Epidemiology of HIV-1 in Ghana: Subtype Distribution, Drug Resistance and Coreceptor Usage. Viruses 2022; 15:128. [PMID: 36680168 PMCID: PMC9865111 DOI: 10.3390/v15010128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 12/27/2022] [Accepted: 12/29/2022] [Indexed: 01/03/2023] Open
Abstract
The greatest HIV-1 genetic diversity is found in West/Central Africa due to the pandemic’s origins in this region, but this diversity remains understudied. We characterized HIV-1 subtype diversity (from both sub-genomic and full-genome viral sequences), drug resistance and coreceptor usage in 103 predominantly (90%) antiretroviral-naive individuals living with HIV-1 in Ghana. Full-genome HIV-1 subtyping confirmed the circulating recombinant form CRF02_AG as the dominant (53.9%) subtype in the region, with the complex recombinant 06_cpx (4%) present as well. Unique recombinants, most of which were mosaics containing CRF02_AG and/or 06_cpx, made up 37% of sequences, while “pure” subtypes were rare (<6%). Pretreatment resistance to at least one drug class was observed in 17% of the cohort, with NNRTI resistance being the most common (12%) and INSTI resistance being relatively rare (2%). CXCR4-using HIV-1 sequences were identified in 23% of participants. Overall, our findings advance our understanding of HIV-1 molecular epidemiology in Ghana. Extensive HIV-1 genetic diversity in the region appears to be fueling the ongoing creation of novel recombinants, the majority CRF02_AG-containing, in the region. The relatively high prevalence of pretreatment NNRTI resistance but low prevalence of INSTI resistance supports the use of INSTI-based first-line regimens in Ghana.
Collapse
Affiliation(s)
- Anna Appah
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
| | - Charlotte J. Beelen
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
| | - Don Kirkby
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
| | - Winnie Dong
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
| | - Aniqa Shahid
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
| | - Brian Foley
- Los Alamos National Laboratory, P.O. Box 1663, Los Alamos, NM 87545, USA
| | - Miriam Mensah
- Fevers Unit, Department of Medicine, Korle Bu Teaching Hospital, Accra P.O. Box KB 77, Ghana
| | - Vincent Ganu
- Department of Internal Medicine, Korle Bu Teaching Hospital, Accra P.O. Box KB 77, Ghana
| | - Peter Puplampu
- Department of Internal Medicine, Korle Bu Teaching Hospital, Accra P.O. Box KB 77, Ghana
| | - Linda E. Amoah
- Noguchi Memorial Institute for Medical Research, University of Ghana, Accra P.O. Box LG 581, Ghana
| | - Nicholas I. Nii-Trebi
- Department of Medical Laboratory Sciences, School of Biomedical and Allied Health Sciences, University of Ghana, Accra P.O. Box LG 25, Ghana
| | - Chanson J. Brumme
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
- Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Zabrina L. Brumme
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC V6Z 1Y6, Canada
| |
Collapse
|
10
|
Khan S, Kortelainen M, Cáceres M, Williams L, Tomescu AI. Improving RNA Assembly via Safety and Completeness in Flow Decompositions. J Comput Biol 2022; 29:1270-1287. [PMID: 36288562 PMCID: PMC9807076 DOI: 10.1089/cmb.2022.0261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage (≈50% more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by ≈20%) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time (4-5×) and space performance (1.2-2.2×), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.
Collapse
Affiliation(s)
- Shahbaz Khan
- Department of Computer Science and Engineering, IIT Roorkee, Roorkee, India.,Department of Computer Science, University of Helsinki, Helsinki, Finland.,Address correspondence to: Prof. Shahbaz Khan, Department of Computer Science and Engineering, IIT Roorkee, Haridwar Highway, Roorkee 247667, Uttarakhand, India
| | - Milla Kortelainen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Manuel Cáceres
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Lucia Williams
- School of Computing, Montana State University, Bozeman, Montana, USA
| | | |
Collapse
|
11
|
Bal A, Simon B, Destras G, Chalvignac R, Semanas Q, Oblette A, Quéromès G, Fanget R, Regue H, Morfin F, Valette M, Lina B, Josset L. Detection and prevalence of SARS-CoV-2 co-infections during the Omicron variant circulation in France. Nat Commun 2022; 13:6316. [PMID: 36274062 PMCID: PMC9588762 DOI: 10.1038/s41467-022-33910-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 10/07/2022] [Indexed: 12/25/2022] Open
Abstract
From December 2021-February 2022, an intense and unprecedented co-circulation of SARS-CoV-2 variants with high genetic diversity raised the question of possible co-infections between variants and how to detect them. Using 11 mixes of Delta:Omicron isolates at different ratios, we evaluated the performance of 4 different sets of primers used for whole-genome sequencing and developed an unbiased bioinformatics method for the detection of co-infections involving genetically distinct SARS-CoV-2 lineages. Applied on 21,387 samples collected between December 6, 2021 to February 27, 2022 from random genomic surveillance in France, we detected 53 co-infections between different lineages. The prevalence of Delta and Omicron (BA.1) co-infections and Omicron lineages BA.1 and BA.2 co-infections were estimated at 0.18% and 0.26%, respectively. Among 6,242 hospitalized patients, the intensive care unit (ICU) admission rates were 1.64%, 4.81% and 15.38% in Omicron, Delta and Delta/Omicron patients, respectively. No BA.1/BA.2 co-infections were reported among ICU admitted patients. Among the 53 co-infected patients, a total of 21 patients (39.6%) were not vaccinated. Although SARS-CoV-2 co-infections were rare in this study, their proper detection is crucial to evaluate their clinical impact and the risk of the emergence of potential recombinants.
Collapse
Affiliation(s)
- Antonin Bal
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, Inserm,U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France
| | - Bruno Simon
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Gregory Destras
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, Inserm,U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France
| | - Richard Chalvignac
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Quentin Semanas
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Antoine Oblette
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Grégory Quéromès
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, Inserm,U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France
| | - Remi Fanget
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Hadrien Regue
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Florence Morfin
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, Inserm,U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France
| | - Martine Valette
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
| | - Bruno Lina
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, Inserm,U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France
| | - Laurence Josset
- Laboratoire de Virologie, Institut des Agents Infectieux, Laboratoire associé au Centre National de Référence des virus des infections respiratoires, Hospices Civils de Lyon, F-69004, Lyon, France.
- GenEPII sequencing platform, Institut des Agents Infectieux, Hospices Civils de Lyon, F-69004, Lyon, France.
- CIRI, Centre International de Recherche en Infectiologie, Team VirPath, Univ Lyon, Inserm,U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, F-69007, Lyon, France.
| |
Collapse
|
12
|
Chatr-aryamontri A, Hirschman L, Ross KE, Oughtred R, Krallinger M, Dolinski K, Tyers M, Korves T, Arighi CN. Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII. Database (Oxford) 2022; 2022:baac084. [PMID: 36197453 PMCID: PMC9534061 DOI: 10.1093/database/baac084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/18/2022] [Accepted: 09/08/2022] [Indexed: 11/06/2022]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system's ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-4/.
Collapse
Affiliation(s)
- Andrew Chatr-aryamontri
- Institute for Research in Immunology and Cancer (IRIC), University of Montreal, Marcelle-Coutu Pavilion, 2950 Chem. de Polytechnique Montreal, Quebec H3T 1J4, Canada
| | - Lynette Hirschman
- MITRE Labs, The MITRE Corporation, 202 Burlington Rd., Bedford, MA 01730, USA
| | - Karen E Ross
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, 2115 Wisconsin Ave NW, DC 20007, USA
| | - Rose Oughtred
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, South Drive, Princeton, NJ 08544, USA
| | - Martin Krallinger
- Barcelona Supercomputing Center (BSC), Plaça d'Eusebi Güell, 1-3, Barcelona 08034, Spain
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, South Drive, Princeton, NJ 08544, USA
| | - Mike Tyers
- Institute for Research in Immunology and Cancer (IRIC), University of Montreal, Marcelle-Coutu Pavilion, 2950 Chem. de Polytechnique Montreal, Quebec H3T 1J4, Canada
| | - Tonia Korves
- MITRE Labs, The MITRE Corporation, 202 Burlington Rd., Bedford, MA 01730, USA
| | - Cecilia N Arighi
- Computer and Information Sciences Department, University of Delaware, Ammon-Pinizzotto Biopharmaceutical Innovation Building, 590 Avenue 1743, Newark, DE 19713, USA
| |
Collapse
|
13
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Citation(s) in RCA: 145] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | | | - Till Robin Lesker
- German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany.,Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | | | | | | | | | - Jan Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University, Düsseldorf, Germany
| | - Aydin Buluç
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Bo Chen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | | | - Philip T L C Clausen
- National Food Institute, Division of Global Surveillance, Technical University of Denmark, Lyngby, Denmark
| | - Alexandru Cristian
- Drexel University, Philadelphia, PA, USA.,Google Inc., Philadelphia, PA, USA
| | - Piotr Wojciech Dabrowski
- Robert Koch-Institut, Berlin, Germany.,Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany
| | | | - Rob Egan
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Eleazar Eskin
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Melissa A Gray
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA
| | - Lars Hestbjerg Hansen
- University of Copenhagen, Department of Plant and Environmental Science, Frederiksberg, Denmark
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Pingqin Huang
- School of Computer Science, Fudan University, Shanghai, China
| | - Luiz Irber
- University of California, Davis, Davis, CA, USA
| | - Huijue Jia
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | - Tue Sparholt Jørgensen
- Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Lyngby, Denmark.,Aarhus University, Department of Environmental Science, Roskilde, Denmark
| | - Silas D Kieser
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Axel Kola
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| | - Jason Kwan
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chenhao Li
- Genome Institute of Singapore, Singapore, Singapore
| | | | - Fabio Malcher-Miranda
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Vanessa R Marcelino
- Sydney Medical School, The University of Sydney, Sydney, Australia.,Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, Australia
| | | | - Pierre Marijon
- Department of Computer Science, Inria, University of Lille, CNRS, Lille, France
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daniel R Mende
- Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland.,Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Niranjan Nagarajan
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National University of Singapore, Singapore, Singapore
| | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Vitor C Piro
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Evan R Rees
- University of Wisconsin-Madison, Madison, WI, USA
| | - Knut Reinert
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Bernhard Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
| | | | - Gail L Rosen
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA.,Center for Biological Discovery from Big Data, Philadelphia, PA, USA
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Varuni Sarwal
- University of California, Los Angeles, Los Angeles, CA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Enrico Seiler
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Lizhen Shi
- Florida Polytechnic University, Lakeland, FL, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Ashleigh Thomas
- DOE Joint Genome Institute, Berkeley, CA, USA.,University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mirko Trajkovski
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Diabetes Center, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Tremblay
- Energy, Mining and Environment, National Research Council Canada, Montreal, Quebec, Canada
| | | | | | - Zhengyang Wang
- School of Computer Science, Fudan University, Shanghai, China
| | - Ziye Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| | | | | | - Katherine Yelick
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Ronghui You
- School of Computer Science, Fudan University, Shanghai, China
| | - Georg Zeller
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | | | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Jie Zhu
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Susanne Häußler
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ariane Khaledi
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Fantin Mesny
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| | | | | | - Nathiana Smit
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Strowig
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany. .,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. .,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany. .,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany.
| |
Collapse
|
14
|
Baaijens JA, Bonizzoni P, Boucher C, Della Vedova G, Pirola Y, Rizzi R, Sirén J. Computational graph pangenomics: a tutorial on data structures and their applications. NATURAL COMPUTING 2022; 21:81-108. [PMID: 36969737 PMCID: PMC10038355 DOI: 10.1007/s11047-022-09882-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/14/2022] [Indexed: 05/08/2023]
Abstract
Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.
Collapse
Affiliation(s)
- Jasmijn A. Baaijens
- Department of Intelligent Systems, Delft University of Technology, Van Mourik Broekmanweg 6, 2628XE Delft, The Netherlands
- Department of Biomedical Informatics, Harvard University, 10 Shattuck St, Boston, MA 02115, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32603, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Yuri Pirola
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Raffaella Rizzi
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Jouni Sirén
- Genomics Institute, University of California, 1156 High St., Santa Cruz, CA 95064, USA
| |
Collapse
|
15
|
Marques-Pereira C, Pires M, Moreira IS. Discovery of Virus-Host interactions using bioinformatic tools. Methods Cell Biol 2022; 169:169-198. [PMID: 35623701 DOI: 10.1016/bs.mcb.2022.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
16
|
Javaran VJ, Moffett P, Lemoyne P, Xu D, Adkar-Purushothama CR, Fall ML. Grapevine Virology in the Third-Generation Sequencing Era: From Virus Detection to Viral Epitranscriptomics. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10112355. [PMID: 34834718 PMCID: PMC8623739 DOI: 10.3390/plants10112355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/16/2021] [Accepted: 10/29/2021] [Indexed: 05/30/2023]
Abstract
Among all economically important plant species in the world, grapevine (Vitis vinifera L.) is the most cultivated fruit plant. It has a significant impact on the economies of many countries through wine and fresh and dried fruit production. In recent years, the grape and wine industry has been facing outbreaks of known and emerging viral diseases across the world. Although high-throughput sequencing (HTS) has been used extensively in grapevine virology, the application and potential of third-generation sequencing have not been explored in understanding grapevine viruses and their impact on the grapevine. Nanopore sequencing, a third-generation technology, can be used for the direct sequencing of both RNA and DNA with minimal infrastructure. Compared to other HTS methods, the MinION nanopore platform is faster and more cost-effective and allows for long-read sequencing. Due to the size of the MinION device, it can be easily carried for field viral disease surveillance. This review article discusses grapevine viruses, the principle of third-generation sequencing platforms, and the application of nanopore sequencing technology in grapevine virus detection, virus-plant interactions, as well as the characterization of viral RNA modifications.
Collapse
Affiliation(s)
- Vahid Jalali Javaran
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada; (V.J.J.); (P.L.); (D.X.)
- Département de Biologie, Centre SÈVE, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada;
| | - Peter Moffett
- Département de Biologie, Centre SÈVE, Université de Sherbrooke, Sherbrooke, QC J1K 2R1, Canada;
| | - Pierre Lemoyne
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada; (V.J.J.); (P.L.); (D.X.)
| | - Dong Xu
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada; (V.J.J.); (P.L.); (D.X.)
| | - Charith Raj Adkar-Purushothama
- Département de Biochimie, Faculté de Médecine des Sciences de la Santé, 3201 rue Jean-Mignault, Sherbrooke, QC J1E 4K8, Canada;
| | - Mamadou Lamine Fall
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada; (V.J.J.); (P.L.); (D.X.)
| |
Collapse
|