1
|
Silva JM, Almeida JR. Enhancing metagenomic classification with compression-based features. Artif Intell Med 2024; 156:102948. [PMID: 39173422 DOI: 10.1016/j.artmed.2024.102948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 06/12/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances. In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification. This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at https://github.com/ieeta-pt/xgTaxonomy.
Collapse
|
2
|
Ning H, Boyes I, Numanagić I, Rott M, Xing L, Zhang X. Diagnostics of viral infections using high-throughput genome sequencing data. Brief Bioinform 2024; 25:bbae501. [PMID: 39417677 DOI: 10.1093/bib/bbae501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 08/30/2024] [Indexed: 10/19/2024] Open
Abstract
Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI's superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca, enhancing accessibility and user convenience.
Collapse
Affiliation(s)
- Haochen Ning
- Department of Mathematics and Statistics, University of Victoria, 3800 Finnerty Road (Ring Road), BC V8P 5C2, Canada
| | - Ian Boyes
- Canadian Food Inspection Agency, Centre for Plant Health, 8801 Saanich Road E., North Saanich, BC V8L 1H3, Canada
| | - Ibrahim Numanagić
- Department of Computer Science, University of Victoria, 3800 Finnerty Road (Ring Road), BC V8P 5C2, Canada
| | - Michael Rott
- Canadian Food Inspection Agency, Centre for Plant Health, 8801 Saanich Road E., North Saanich, BC V8L 1H3, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, 106 Wiggins Road, Saskatoon, SK S7N 5E6, Canada
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, 3800 Finnerty Road (Ring Road), BC V8P 5C2, Canada
| |
Collapse
|
3
|
Li Z, Guo Z, Wu W, Tan L, Long Q, Xia H, Hu M. The effects of sequencing strategies on Metagenomic pathogen detection using bronchoalveolar lavage fluid samples. Heliyon 2024; 10:e33429. [PMID: 39027502 PMCID: PMC11255660 DOI: 10.1016/j.heliyon.2024.e33429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 06/17/2024] [Accepted: 06/21/2024] [Indexed: 07/20/2024] Open
Abstract
Objectives Metagenomic next-generation sequencing (mNGS) is a powerful tool for pathogen detection. The accuracy depends on both wet lab and dry lab procedures. The objective of our study was to assess the influence of read length and dataset size on pathogen detection. Methods In this study, 43 clinical BALF samples, which tested positive via clinical mNGS and were consistent with the diagnosis, were subjected to re-sequencing on the Illumina NovaSeq 6000 platform. The raw re-sequencing data, consisting of 100 million (M) paired-end 150 bp (PE150) reads, were divided into simulated datasets with eight different data sizes (5 M, 10 M, 15 M, 20 M, 30 M, 50 M, 75 M, 100 M) and five different read lengths (single-end 50 bp (SE50), SE75, SE100, PE100, and PE150). Both Kraken2 and IDseq bioinformatics pipelines were employed to analyze the previously diagnosed pathogens in the simulated data. Detection of pathogens was based on read counts ranging from 1 to 10 and RPM values ranging from 0.2 to 2. Results Our results revealed that increasing dataset sizes and read lengths can enhance the performance of mNGS in pathogen detection. However, a larger data sizes for mNGS require higher economic costs and longer turnaround time for data analysis. Our findings indicate 20 M reads being sufficient for SE75 mode to achieve high recall rates. Additionally, high nucleic acid loads in samples can lead to increased stability in pathogen detection efficiency, reducing the impact of sequencing strategies. The choice of bioinformatics pipelines had a significant impact on recall rates achieved in pathogen detection. Conclusions Increasing dataset sizes and read lengths can enhance the performance of mNGS in pathogen detection but increase the economic and time costs of sequencing and data analysis. Currently, the 20 M reads in SE75 mode may be the best sequencing option.
Collapse
Affiliation(s)
- Ziyang Li
- Department of Laboratory Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
- Center for Clinical Molecular Diagnostics, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Zhe Guo
- Department of Laboratory Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
- Center for Clinical Molecular Diagnostics, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Weimin Wu
- Department of Laboratory Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
- Center for Clinical Molecular Diagnostics, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Li Tan
- Department of Laboratory Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
- Center for Clinical Molecular Diagnostics, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Qichen Long
- Department of Laboratory Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
- Center for Clinical Molecular Diagnostics, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| | - Han Xia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Min Hu
- Department of Laboratory Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
- Center for Clinical Molecular Diagnostics, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China
| |
Collapse
|
4
|
Liu P, Wilson P, Redquest B, Keobouasone S, Manseau M. Seq2Sat and SatAnalyzer toolkit: Towards comprehensive microsatellite genotyping from sequencing data. Mol Ecol Resour 2024; 24:e13929. [PMID: 38289068 DOI: 10.1111/1755-0998.13929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/22/2023] [Accepted: 01/09/2024] [Indexed: 03/06/2024]
Abstract
Accurate and efficient microsatellite loci genotyping is an essential process in population genetics that is also used in various demographic analyses. Protocols for next-generation sequencing of microsatellite loci enable high-throughput and cross-compatible allele scoring, common issues that are not addressed by conventional capillary-based approaches. To improve this process, we have developed an all-in-one software, called Seq2Sat (sequence to microsatellite), in C++ to support automated microsatellite genotyping. It directly takes raw reads of microsatellite amplicons and conducts read quality control before inferring genotypes based on depth-of-read, read ratio, sequence composition and length. We have also developed a module for sex identification based on sex chromosome-specific locus amplicons. To allow for greater user access and complement autoscoring, we developed SatAnalyzer (microsatellite analyzer), a user-friendly web-based platform that conducts reads-to-report analyses by calling Seq2Sat for genotype autoscoring and produces interactive genotype graphs for manual editing. SatAnalyzer also allows users to troubleshoot multiplex optimization by analysing read quality and distribution across loci and samples in support of high-quality library preparation. To evaluate its performance, we benchmarked our toolkit Seq2Sat/SatAnalyzer against a conventional capillary gel method and existing microsatellite genotyping software, MEGASAT, using two datasets. Results showed that SatAnalyzer can achieve >99.70% genotyping accuracy and Seq2Sat is ~5 times faster than MEGASAT despite many more informative tables and figures being generated. Seq2Sat and SatAnalyzer are freely available on github (https://github.com/ecogenomicscanada/Seq2Sat) and dockerhub (https://hub.docker.com/r/rocpengliu/satanalyzer).
Collapse
Affiliation(s)
- Peng Liu
- Science and Technology, Environment and Climate Change Canada, Ottawa, Ontario, Canada
| | - Paul Wilson
- Biology Department, Trent University, Peterborough, Ontario, Canada
| | | | - Sonesinh Keobouasone
- Science and Technology, Environment and Climate Change Canada, Ottawa, Ontario, Canada
| | - Micheline Manseau
- Science and Technology, Environment and Climate Change Canada, Ottawa, Ontario, Canada
| |
Collapse
|
5
|
Tucker EJ, Wong SW, Marri S, Ali S, Fedele AO, Michael MZ, Rojas-Canales D, Li JY, Lim CK, Gleadle JM. SARS-CoV-2 produces a microRNA CoV2-miR-O8 in patients with COVID-19 infection. iScience 2024; 27:108719. [PMID: 38226175 PMCID: PMC10788221 DOI: 10.1016/j.isci.2023.108719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 09/28/2023] [Accepted: 12/11/2023] [Indexed: 01/17/2024] Open
Abstract
Many viruses produce microRNAs (miRNAs), termed viral miRNAs (v-miRNAs), with the capacity to target host gene expression. Bioinformatic and cell culture studies suggest that SARS-CoV-2 can also generate v-miRNAs. This patient-based study defines the SARS-CoV-2 encoded small RNAs present in nasopharyngeal swabs of patients with COVID-19 infection using small RNA-seq. A specific conserved sequence (CoV2-miR-O8) is defined that is not expressed in other coronaviruses but is preserved in all SARS-CoV-2 variants. CoV2-miR-O8 is highly represented in nasopharyngeal samples from patients with COVID-19 infection, is detected by RT-PCR assays in patients, has features consistent with Dicer and Drosha generation as well as interaction with Argonaute and targets specific human microRNAs.
Collapse
Affiliation(s)
- Elise J. Tucker
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
- College of Medicine and Public Health, Flinders University, SA, Australia
| | - Soon Wei Wong
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
- College of Medicine and Public Health, Flinders University, SA, Australia
| | - Shashikanth Marri
- College of Medicine and Public Health, Flinders University, SA, Australia
| | - Saira Ali
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
- College of Medicine and Public Health, Flinders University, SA, Australia
| | - Anthony O. Fedele
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
| | - Michael Z. Michael
- College of Medicine and Public Health, Flinders University, SA, Australia
- Department of Gastroenterology, Flinders Medical Centre, SA, Australia
| | - Darling Rojas-Canales
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
- College of Medicine and Public Health, Flinders University, SA, Australia
| | - Jordan Y. Li
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
- College of Medicine and Public Health, Flinders University, SA, Australia
| | - Chuan Kok Lim
- Infectious Diseases Laboratories, SA Pathology, SA, Australia
| | - Jonathan M. Gleadle
- Department of Renal Medicine, Flinders Medical Centre, SA, Australia
- College of Medicine and Public Health, Flinders University, SA, Australia
| |
Collapse
|
6
|
Will I, Beckerson WC, de Bekker C. Using machine learning to predict protein-protein interactions between a zombie ant fungus and its carpenter ant host. Sci Rep 2023; 13:13821. [PMID: 37620441 PMCID: PMC10449854 DOI: 10.1038/s41598-023-40764-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/16/2023] [Indexed: 08/26/2023] Open
Abstract
Parasitic fungi produce proteins that modulate virulence, alter host physiology, and trigger host responses. These proteins, classified as a type of "effector," often act via protein-protein interactions (PPIs). The fungal parasite Ophiocordyceps camponoti-floridani (zombie ant fungus) manipulates Camponotus floridanus (carpenter ant) behavior to promote transmission. The most striking aspect of this behavioral change is a summit disease phenotype where infected hosts ascend and attach to an elevated position. Plausibly, interspecific PPIs drive aspects of Ophiocordyceps infection and host manipulation. Machine learning PPI predictions offer high-throughput methods to produce mechanistic hypotheses on how this behavioral manipulation occurs. Using D-SCRIPT to predict host-parasite PPIs, we found ca. 6000 interactions involving 2083 host proteins and 129 parasite proteins, which are encoded by genes upregulated during manipulated behavior. We identified multiple overrepresentations of functional annotations among these proteins. The strongest signals in the host highlighted neuromodulatory G-protein coupled receptors and oxidation-reduction processes. We also detected Camponotus structural and gene-regulatory proteins. In the parasite, we found enrichment of Ophiocordyceps proteases and frequent involvement of novel small secreted proteins with unknown functions. From these results, we provide new hypotheses on potential parasite effectors and host targets underlying zombie ant behavioral manipulation.
Collapse
Affiliation(s)
- Ian Will
- Department of Biology, University of Central Florida, 4110 Libra Drive, Orlando, FL, 32816, USA.
| | - William C Beckerson
- Department of Biology, University of Central Florida, 4110 Libra Drive, Orlando, FL, 32816, USA
| | - Charissa de Bekker
- Department of Biology, University of Central Florida, 4110 Libra Drive, Orlando, FL, 32816, USA.
- Department of Biology, Microbiology, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands.
| |
Collapse
|
7
|
Noel B, Denoeud F, Rouan A, Buitrago-López C, Capasso L, Poulain J, Boissin E, Pousse M, Da Silva C, Couloux A, Armstrong E, Carradec Q, Cruaud C, Labadie K, Lê-Hoang J, Tambutté S, Barbe V, Moulin C, Bourdin G, Iwankow G, Romac S, Agostini S, Banaigs B, Boss E, Bowler C, de Vargas C, Douville E, Flores JM, Forcioli D, Furla P, Galand PE, Lombard F, Pesant S, Reynaud S, Sullivan MB, Sunagawa S, Thomas OP, Troublé R, Thurber RV, Allemand D, Planes S, Gilson E, Zoccola D, Wincker P, Voolstra CR, Aury JM. Pervasive tandem duplications and convergent evolution shape coral genomes. Genome Biol 2023; 24:123. [PMID: 37264421 DOI: 10.1186/s13059-023-02960-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 05/05/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND Over the last decade, several coral genomes have been sequenced allowing a better understanding of these symbiotic organisms threatened by climate change. Scleractinian corals are reef builders and are central to coral reef ecosystems, providing habitat to a great diversity of species. RESULTS In the frame of the Tara Pacific expedition, we assemble two coral genomes, Porites lobata and Pocillopora cf. effusa, with vastly improved contiguity that allows us to study the functional organization of these genomes. We annotate their gene catalog and report a relatively higher gene number than that found in other public coral genome sequences, 43,000 and 32,000 genes, respectively. This finding is explained by a high number of tandemly duplicated genes, accounting for almost a third of the predicted genes. We show that these duplicated genes originate from multiple and distinct duplication events throughout the coral lineage. They contribute to the amplification of gene families, mostly related to the immune system and disease resistance, which we suggest to be functionally linked to coral host resilience. CONCLUSIONS At large, we show the importance of duplicated genes to inform the biology of reef-building corals and provide novel avenues to understand and screen for differences in stress resilience.
Collapse
Affiliation(s)
- Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - France Denoeud
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Alice Rouan
- Université Côte d'Azur, CNRS, Inserm, IRCAN, Nice, France
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
| | | | - Laura Capasso
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
- Centre Scientifique de Monaco, Marine Biology Department, Monaco City, 98000, Monaco
- Sorbonne Université, Collège Doctoral, 75005, Paris, France
| | - Julie Poulain
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Emilie Boissin
- Laboratoire d'Excellence CORAIL, PSL Research University, EPHE-UPVD-CNRS, USR 3278 CRIOBE, Université de Perpignan, 52 Avenue Paul Alduy, 66860, Cedex, Perpignan, France
| | - Mélanie Pousse
- Université Côte d'Azur, CNRS, Inserm, IRCAN, Nice, France
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Arnaud Couloux
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Eric Armstrong
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Quentin Carradec
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Corinne Cruaud
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
| | - Karine Labadie
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
| | - Julie Lê-Hoang
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Sylvie Tambutté
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
- Centre Scientifique de Monaco, Marine Biology Department, Monaco City, 98000, Monaco
| | - Valérie Barbe
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | - Clémentine Moulin
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Fondation Tara Océan, Base Tara, 8 Rue de Prague, 75 012, Paris, France
| | | | - Guillaume Iwankow
- Laboratoire d'Excellence CORAIL, PSL Research University, EPHE-UPVD-CNRS, USR 3278 CRIOBE, Université de Perpignan, 52 Avenue Paul Alduy, 66860, Cedex, Perpignan, France
| | - Sarah Romac
- AD2M, UMR 7144, Sorbonne Université, CNRS, Station Biologique de Roscoff, ECOMAP, Roscoff, France
| | - Sylvain Agostini
- Shimoda Marine Research Center, University of Tsukuba, 5-10-1, Shimoda, Shizuoka, Japan
| | - Bernard Banaigs
- Laboratoire d'Excellence CORAIL, PSL Research University, EPHE-UPVD-CNRS, USR 3278 CRIOBE, Université de Perpignan, 52 Avenue Paul Alduy, 66860, Cedex, Perpignan, France
| | - Emmanuel Boss
- School of Marine Sciences, University of Maine, Orono, USA
| | - Chris Bowler
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Institut de Biologie de L'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 75005, Paris, France
| | - Colomban de Vargas
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- AD2M, UMR 7144, Sorbonne Université, CNRS, Station Biologique de Roscoff, ECOMAP, Roscoff, France
| | - Eric Douville
- Laboratoire Des Sciences du Climat Et de L'Environnement, LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, Gif-Sur-Yvette, 91191, France
| | - J Michel Flores
- Department of Earth and Planetary Sciences, Weizmann Institute of Science, 76100, Rehovot, Israel
| | - Didier Forcioli
- Université Côte d'Azur, CNRS, Inserm, IRCAN, Nice, France
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
| | - Paola Furla
- Université Côte d'Azur, CNRS, Inserm, IRCAN, Nice, France
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
| | - Pierre E Galand
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Sorbonne Université, CNRS, Laboratoire d'Ecogéochimie des Environnements Benthiques (LECOB), Observatoire Océanologique de Banyuls, Banyuls Sur Mer, France
| | - Fabien Lombard
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Institut de La Mer de Villefranche Sur Mer, Sorbonne Université, Laboratoire d'Océanographie de Villefranche, Villefranche-Sur-Mer, 06230, France
- Institut Universitaire de France, Paris, 75231, France
| | - Stéphane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stéphanie Reynaud
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
- Centre Scientifique de Monaco, Marine Biology Department, Monaco City, 98000, Monaco
| | - Matthew B Sullivan
- Departments of Microbiology and Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, 43210, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Vladimir-Prelog-Weg 4, CH-8093, Zurich, Switzerland
| | - Olivier P Thomas
- School of Biological and Chemical Sciences, Ryan Institute, University of Galway, University Road H91 TK33, Galway, Ireland
| | - Romain Troublé
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Fondation Tara Océan, Base Tara, 8 Rue de Prague, 75 012, Paris, France
| | - Rebecca Vega Thurber
- Department of Microbiology, Oregon State University, 220 Nash Hall, Corvallis, OR, 97331, USA
| | - Denis Allemand
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
- Centre Scientifique de Monaco, Marine Biology Department, Monaco City, 98000, Monaco
| | - Serge Planes
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
- Laboratoire d'Excellence CORAIL, PSL Research University, EPHE-UPVD-CNRS, USR 3278 CRIOBE, Université de Perpignan, 52 Avenue Paul Alduy, 66860, Cedex, Perpignan, France
| | - Eric Gilson
- Université Côte d'Azur, CNRS, Inserm, IRCAN, Nice, France
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
- Department of Human Genetics, CHU Nice, Nice, France
| | - Didier Zoccola
- LIA ROPSE, Laboratoire International Associé, Université Côte d'Azur - Centre Scientifique de Monaco, France
- Centre Scientifique de Monaco, Marine Biology Department, Monaco City, 98000, Monaco
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France
| | | | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, 91057, France.
- Research Federation for the Study of Global Ocean Systems Ecology and Evolution, R2022/Tara Oceans GO-SEE, 3 Rue Michel-Ange, 75016, Paris, France.
| |
Collapse
|
8
|
Akbari Rokn Abadi S, Mohammadi A, Koohi S. A new profiling approach for DNA sequences based on the nucleotides' physicochemical features for accurate analysis of SARS-CoV-2 genomes. BMC Genomics 2023; 24:266. [PMID: 37202721 DOI: 10.1186/s12864-023-09373-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 05/11/2023] [Indexed: 05/20/2023] Open
Abstract
BACKGROUND The prevalence of the COVID-19 disease in recent years and its widespread impact on mortality, as well as various aspects of life around the world, has made it important to study this disease and its viral cause. However, very long sequences of this virus increase the processing time, complexity of calculation, and memory consumption required by the available tools to compare and analyze the sequences. RESULTS We present a new encoding method, named PC-mer, based on the k-mer and physic-chemical properties of nucleotides. This method minimizes the size of encoded data by around 2 k times compared to the classical k-mer based profiling method. Moreover, using PC-mer, we designed two tools: 1) a machine-learning-based classification tool for coronavirus family members with the ability to recive input sequences from the NCBI database, and 2) an alignment-free computational comparison tool for calculating dissimilarity scores between coronaviruses at the genus and species levels. CONCLUSIONS PC-mer achieves 100% accuracy despite the use of very simple classification algorithms based on Machine Learning. Assuming dynamic programming-based pairwise alignment as the ground truth approach, we achieved a degree of convergence of more than 98% for coronavirus genus-level sequences and 93% for SARS-CoV-2 sequences using PC-mer in the alignment-free classification method. This outperformance of PC-mer suggests that it can serve as a replacement for alignment-based approaches in certain sequence analysis applications that rely on similarity/dissimilarity scores, such as searching sequences, comparing sequences, and certain types of phylogenetic analysis methods that are based on sequence comparison.
Collapse
Affiliation(s)
| | | | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
9
|
Xu X, Yin Z, Yan L, Zhang H, Xu B, Wei Y, Niu B, Schmidt B, Liu W. RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches. Genome Biol 2023; 24:121. [PMID: 37198663 PMCID: PMC10190105 DOI: 10.1186/s13059-023-02961-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 05/05/2023] [Indexed: 05/19/2023] Open
Abstract
We present RabbitTClust, a fast and memory-efficient genome clustering tool based on sketch-based distance estimation. Our approach enables efficient processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms. 113,674 complete bacterial genome sequences from RefSeq, 455 GB in FASTA format, can be clustered within less than 6 min and 1,009,738 GenBank assembled bacterial genomes, 4.0 TB in FASTA format, within only 34 min on a 128-core workstation. Our results further identify 1269 redundant genomes, with identical nucleotide content, in the RefSeq bacterial genomes database.
Collapse
Affiliation(s)
- Xiaoming Xu
- School of Software, Shandong University, Jinan, China
| | - Zekun Yin
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shandong University, Shenzhen, China
| | - Lifeng Yan
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shandong University, Shenzhen, China
| | - Hao Zhang
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shandong University, Shenzhen, China
| | - Borui Xu
- School of Software, Shandong University, Jinan, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Beifang Niu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Bertil Schmidt
- Institute for Computer Science, Johannes Gutenberg University, Mainz, Germany
| | - Weiguo Liu
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
10
|
de Nies L, Galata V, Martin-Gallausiaux C, Despotovic M, Busi SB, Snoeck CJ, Delacour L, Budagavi DP, Laczny CC, Habier J, Lupu PC, Halder R, Fritz JV, Marques T, Sandt E, O'Sullivan MP, Ghosh S, Satagopam V, Krüger R, Fagherazzi G, Ollert M, Hefeng FQ, May P, Wilmes P. Altered infective competence of the human gut microbiome in COVID-19. MICROBIOME 2023; 11:46. [PMID: 36894986 PMCID: PMC9995755 DOI: 10.1186/s40168-023-01472-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Infections with SARS-CoV-2 have a pronounced impact on the gastrointestinal tract and its resident microbiome. Clear differences between severe cases of infection and healthy individuals have been reported, including the loss of commensal taxa. We aimed to understand if microbiome alterations including functional shifts are unique to severe cases or a common effect of COVID-19. We used high-resolution systematic multi-omic analyses to profile the gut microbiome in asymptomatic-to-moderate COVID-19 individuals compared to a control group. RESULTS We found a striking increase in the overall abundance and expression of both virulence factors and antimicrobial resistance genes in COVID-19. Importantly, these genes are encoded and expressed by commensal taxa from families such as Acidaminococcaceae and Erysipelatoclostridiaceae, which we found to be enriched in COVID-19-positive individuals. We also found an enrichment in the expression of a betaherpesvirus and rotavirus C genes in COVID-19-positive individuals compared to healthy controls. CONCLUSIONS Our analyses identified an altered and increased infective competence of the gut microbiome in COVID-19 patients. Video Abstract.
Collapse
Affiliation(s)
- Laura de Nies
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Valentina Galata
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Camille Martin-Gallausiaux
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Milena Despotovic
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Susheel Bhanu Busi
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Chantal J Snoeck
- Clinical and Applied Virology, Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
| | - Lea Delacour
- Luxembourg Centre for Systems Biomedicine, LCSB Operations, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Deepthi Poornima Budagavi
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Cédric Christian Laczny
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Janine Habier
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Paula-Cristina Lupu
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rashi Halder
- Scientific Central Services, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Joëlle V Fritz
- Transversal Translation Medicine, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Taina Marques
- Translational Neuroscience Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Estelle Sandt
- Translational Medicine Operations Hub, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Marc Paul O'Sullivan
- Translational Medicine Operations Hub, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Soumyabrata Ghosh
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rejko Krüger
- Transversal Translation Medicine, Luxembourg Institute of Health, Strassen, Luxembourg
- Translational Neuroscience Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Precision Health, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Markus Ollert
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg
- Department of Dermatology and Allergy Centre, Odense University Hospital, Odense, Denmark
| | - Feng Q Hefeng
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-Sur-Alzette, Luxembourg
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Paul Wilmes
- Systems Ecology Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, 6, Avenue du Swing, L-4367, Belvaux, Luxembourg.
| |
Collapse
|
11
|
Arnold AP, Chen X, Grzybowski MN, Ryan JM, Sengelaub DR, Mohanroy T, Furlan VA, Grisham W, Malloy L, Takizawa A, Wiese CB, Vergnes L, Skaletsky H, Page DC, Reue K, Harley VR, Dwinell MR, Geurts AM. A "Four Core Genotypes" rat model to distinguish mechanisms underlying sex-biased phenotypes and diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527738. [PMID: 36798326 PMCID: PMC9934672 DOI: 10.1101/2023.02.09.527738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Background We have generated a rat model similar to the Four Core Genotypes mouse model, allowing comparison of XX and XY rats with the same type of gonad. The model detects novel sex chromosome effects (XX vs. XY) that contribute to sex differences in any rat phenotype. Methods XY rats were produced with an autosomal transgene of Sry , the testis-determining factor gene, which were fathers of XX and XY progeny with testes. In other rats, CRISPR-Cas9 technology was used to remove Y chromosome factors that initiate testis differentiation, producing fertile XY gonadal females that have XX and XY progeny with ovaries. These groups can be compared to detect sex differences caused by sex chromosome complement (XX vs. XY) and/or by gonadal hormones (rats with testes vs. ovaries). Results We have measured numerous phenotypes to characterize this model, including gonadal histology, breeding performance, anogenital distance, levels of reproductive hormones, body and organ weights, and central nervous system sexual dimorphisms. Serum testosterone levels were comparable in adult XX and XY gonadal males. Numerous phenotypes previously found to be sexually differentiated by the action of gonadal hormones were found to be similar in XX and XY rats with the same type of gonad, suggesting that XX and XY rats with the same type of gonad have comparable levels of gonadal hormones at various stages of development. Conclusion The results establish a powerful new model to discriminate sex chromosome and gonadal hormone effects that cause sexual differences in rat physiology and disease.
Collapse
|
12
|
Forensic Analysis of Novel SARS2r-CoV Identified in Game Animal Datasets in China Shows Evolutionary Relationship to Pangolin GX CoV Clade and Apparent Genetic Experimentation. Appl Microbiol 2022. [DOI: 10.3390/applmicrobiol2040068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Pangolins are the only animals other than bats proposed to have been infected with SARS-CoV-2 related coronaviruses (SARS2r-CoVs) prior to the COVID-19 pandemic. Here, we examine the novel SARS2r-CoV we previously identified in game animal metatranscriptomic datasets sequenced by the Nanjing Agricultural University in 2022, and find that sections of the partial genome phylogenetically group with Guangxi pangolin CoVs (GX PCoVs), while the full RdRp sequence groups with bat-SL-CoVZC45. While the novel SARS2r-CoV is found in 6 pangolin datasets, it is also found in 10 additional NGS datasets from 5 separate mammalian species and is likely related to contamination by a laboratory researched virus. Absence of bat mitochondrial sequences from the datasets, the fragmentary nature of the virus sequence and the presence of a partial sequence of a cloning vector attached to a SARS2r-CoV read suggests that it has been cloned. We find that NGS datasets containing the novel SARS2r-CoV are contaminated with significant Homo sapiens genetic material, and numerous viruses not associated with the host animals sampled. We further identify the dominant human haplogroup of the contaminating H. sapiens genetic material to be F1c1a1, which is of East Asian provenance. The association of this novel SARS2r-CoV with both bat CoV and the GX PCoV clades is an important step towards identifying the origin of the GX PCoVs.
Collapse
|
13
|
Thommana A, Shakya M, Gandhi J, Fung CK, Chain PSG, Maljkovic Berry I, Conte MA. Intrahost SARS-CoV-2 k-mer Identification Method (iSKIM) for Rapid Detection of Mutations of Concern Reveals Emergence of Global Mutation Patterns. Viruses 2022; 14:2128. [PMID: 36298683 PMCID: PMC9609618 DOI: 10.3390/v14102128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/23/2022] [Accepted: 09/24/2022] [Indexed: 11/27/2022] Open
Abstract
Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM ("intrahost SARS-CoV-2 k-mer identification method") to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.
Collapse
Affiliation(s)
- Ashley Thommana
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD 20910, USA
- Montgomery Blair High School, Silver Spring, MD 20901, USA
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Jaykumar Gandhi
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD 20910, USA
| | - Christian K. Fung
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD 20910, USA
| | - Patrick S. G. Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Irina Maljkovic Berry
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD 20910, USA
- Integrated Research Facility, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Frederick, MD 21702, USA
| | - Matthew A. Conte
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD 20910, USA
| |
Collapse
|
14
|
Thommana A, Shakya M, Gandhi J, Fung CK, Chain PSG, Berry IM, Conte MA. Intrahost SARS-CoV-2 k-mer identification method (iSKIM) for rapid detection of mutations of concern reveals emergence of global mutation patterns. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.08.16.504117. [PMID: 36032969 PMCID: PMC9413717 DOI: 10.1101/2022.08.16.504117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM ("intrahost SARS-CoV-2 k-mer identification method") to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.
Collapse
Affiliation(s)
- Ashley Thommana
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD, USA
- Montgomery Blair High School, Silver Spring, MD, USA
| | - Migun Shakya
- Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, USA
| | - Jaykumar Gandhi
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Christian K Fung
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Patrick S G Chain
- Los Alamos National Laboratory, Biosecurity and Public Health Group, Bioscience Division, Los Alamos, NM, USA
| | - Irina Maljkovic Berry
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD, USA
- Integrated Research Facility, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Frederick, MD, USA
| | - Matthew A Conte
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, MD, USA
| |
Collapse
|
15
|
Becher H, Sampson J, Twyford AD. Measuring the Invisible: The Sequences Causal of Genome Size Differences in Eyebrights ( Euphrasia) Revealed by k-mers. FRONTIERS IN PLANT SCIENCE 2022; 13:818410. [PMID: 35968114 PMCID: PMC9372453 DOI: 10.3389/fpls.2022.818410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 06/20/2022] [Indexed: 06/15/2023]
Abstract
Genome size variation within plant taxa is due to presence/absence variation, which may affect low-copy sequences or genomic repeats of various frequency classes. However, identifying the sequences underpinning genome size variation is challenging because genome assemblies commonly contain collapsed representations of repetitive sequences and because genome skimming studies by design miss low-copy number sequences. Here, we take a novel approach based on k-mers, short sub-sequences of equal length k, generated from whole-genome sequencing data of diploid eyebrights (Euphrasia), a group of plants that have considerable genome size variation within a ploidy level. We compare k-mer inventories within and between closely related species, and quantify the contribution of different copy number classes to genome size differences. We further match high-copy number k-mers to specific repeat types as retrieved from the RepeatExplorer2 pipeline. We find genome size differences of up to 230Mbp, equivalent to more than 20% genome size variation. The largest contributions to these differences come from rDNA sequences, a 145-nt genomic satellite and a repeat associated with an Angela transposable element. We also find size differences in the low-copy number class (copy number ≤ 10×) of up to 27 Mbp, possibly indicating differences in gene space between our samples. We demonstrate that it is possible to pinpoint the sequences causing genome size variation within species without the use of a reference genome. Such sequences can serve as targets for future cytogenetic studies. We also show that studies of genome size variation should go beyond repeats if they aim to characterise the full range of genomic variants. To allow future work with other taxonomic groups, we share our k-mer analysis pipeline, which is straightforward to run, relying largely on standard GNU command line tools.
Collapse
Affiliation(s)
- Hannes Becher
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Jacob Sampson
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Alex D. Twyford
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
- Royal Botanic Garden Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
16
|
Mugnier N, Griffon A, Simon B, Rambaud M, Regue H, Bal A, Destras G, Tournoud M, Jaillard M, Betraoui A, Santiago E, Cheynet V, Vignola A, Ligeon V, Josset L, Brengel-Pesce K. Evaluation of EPISEQ SARS-CoV-2 and a Fully Integrated Application to Identify SARS-CoV-2 Variants from Several Next-Generation Sequencing Approaches. Viruses 2022; 14:1674. [PMID: 36016297 PMCID: PMC9416160 DOI: 10.3390/v14081674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/22/2022] [Accepted: 07/27/2022] [Indexed: 11/26/2022] Open
Abstract
Whole-genome sequencing has become an essential tool for real-time genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) worldwide. The handling of raw next-generation sequencing (NGS) data is a major challenge for sequencing laboratories. We developed an easy-to-use web-based application (EPISEQ SARS-CoV-2) to analyse SARS-CoV-2 NGS data generated on common sequencing platforms using a variety of commercially available reagents. This application performs in one click a quality check, a reference-based genome assembly, and the analysis of the generated consensus sequence as to coverage of the reference genome, mutation screening and variant identification according to the up-to-date Nextstrain clade and Pango lineage. In this study, we validated the EPISEQ SARS-CoV-2 pipeline against a reference pipeline and compared the performance of NGS data generated by different sequencing protocols using EPISEQ SARS-CoV-2. We showed a strong agreement in SARS-CoV-2 clade and lineage identification (>99%) and in spike mutation detection (>99%) between EPISEQ SARS-CoV-2 and the reference pipeline. The comparison of several sequencing approaches using EPISEQ SARS-CoV-2 revealed 100% concordance in clade and lineage classification. It also uncovered reagent-related sequencing issues with a potential impact on SARS-CoV-2 mutation reporting. Altogether, EPISEQ SARS-CoV-2 allows an easy, rapid and reliable analysis of raw NGS data to support the sequencing efforts of laboratories with limited bioinformatics capacity and those willing to accelerate genomic surveillance of SARS-CoV-2.
Collapse
Affiliation(s)
- Nathalie Mugnier
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Aurélien Griffon
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Bruno Simon
- GenEPII Sequencing Platform, Institut des Agents Infectieux, Hospices Civils de Lyon, 69004 Lyon, France; (B.S.); (H.R.); (A.B.); (G.D.); (L.J.)
| | - Maxence Rambaud
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Hadrien Regue
- GenEPII Sequencing Platform, Institut des Agents Infectieux, Hospices Civils de Lyon, 69004 Lyon, France; (B.S.); (H.R.); (A.B.); (G.D.); (L.J.)
| | - Antonin Bal
- GenEPII Sequencing Platform, Institut des Agents Infectieux, Hospices Civils de Lyon, 69004 Lyon, France; (B.S.); (H.R.); (A.B.); (G.D.); (L.J.)
| | - Gregory Destras
- GenEPII Sequencing Platform, Institut des Agents Infectieux, Hospices Civils de Lyon, 69004 Lyon, France; (B.S.); (H.R.); (A.B.); (G.D.); (L.J.)
| | - Maud Tournoud
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Magali Jaillard
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Abel Betraoui
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Emmanuelle Santiago
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Valérie Cheynet
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
- Joint Research Unit Hospices Civils de Lyon-bioMerieux, Centre Hospitalier Lyon Sud, 69310 Pierre-Benite, France
| | | | - Véronique Ligeon
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
| | - Laurence Josset
- GenEPII Sequencing Platform, Institut des Agents Infectieux, Hospices Civils de Lyon, 69004 Lyon, France; (B.S.); (H.R.); (A.B.); (G.D.); (L.J.)
| | - Karen Brengel-Pesce
- BioMérieux SA, 69280 Marcy-l’Étoile, France; (N.M.); (A.G.); (M.R.); (M.T.); (M.J.); (A.B.); (E.S.); (V.C.); (V.L.)
- Joint Research Unit Hospices Civils de Lyon-bioMerieux, Centre Hospitalier Lyon Sud, 69310 Pierre-Benite, France
| |
Collapse
|
17
|
Zhang H, Chang Q, Yin Z, Xu X, Wei Y, Schmidt B, Liu W. RabbitV: fast detection of viruses and microorganisms in sequencing data on multi-core architectures. Bioinformatics 2022; 38:2932-2933. [PMID: 35561184 DOI: 10.1093/bioinformatics/btac187] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/23/2022] [Accepted: 03/24/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Detection and identification of viruses and microorganisms in sequencing data plays an important role in pathogen diagnosis and research. However, existing tools for this problem often suffer from high runtimes and memory consumption. RESULTS We present RabbitV, a tool for rapid detection of viruses and microorganisms in Illumina sequencing datasets based on fast identification of unique k-mers. It can exploit the power of modern multi-core CPUs by using multi-threading, vectorization and fast data parsing. Experiments show that RabbitV outperforms fastv by a factor of at least 42.5 and 14.4 in unique k-mer generation (RabbitUniq) and pathogen identification (RabbitV), respectively. Furthermore, RabbitV is able to detect COVID-19 from 40 samples of sequencing data (255 GB in FASTQ format) in only 320 s. AVAILABILITY AND IMPLEMENTATION RabbitUniq and RabbitV are available at https://github.com/RabbitBio/RabbitUniq and https://github.com/RabbitBio/RabbitV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Zhang
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shenzhen, China
| | - Qixin Chang
- School of Software, Shandong University, Jinan, China
| | - Zekun Yin
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shenzhen, China
| | - Xiaoming Xu
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shenzhen, China
| | - Yanjie Wei
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Bertil Schmidt
- Institute for Computer Science, Johannes Gutenberg University, Mainz, Germany
| | - Weiguo Liu
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
18
|
Molecular Characteristics and Incidence of Apple Rubbery Wood Virus 2 and Citrus Virus A Infecting Pear Trees in China. Viruses 2022; 14:v14030576. [PMID: 35336983 PMCID: PMC8952854 DOI: 10.3390/v14030576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/28/2022] [Accepted: 03/05/2022] [Indexed: 02/05/2023] Open
Abstract
Apple rubbery wood virus 2 (ARWV-2) and citrus virus A (CiVA) belong to a recently approved family Phenuiviridae in the order Bunyavirales and possess negative-sense single-stranded RNA genomes. In this study, the genome sequence of three ARWV-2 isolates (S17E2, LYC2, and LYXS) and a CiVA isolate (CiVA-P) infecting pear trees grown in China were characterized using high-throughput sequencing combined with conventional reverse-transcription PCR (RT-PCR) assays. The genome-wide nt sequence identities were above 93.6% among the ARWV-2 isolates and above 93% among CiVA isolates. Sequence comparisons showed that sequence diversity occurred in the 5′ untranslated region of the ARWV-2 genome and the intergenic region of the CiVA genome. For the first time, this study revealed that ARWV-2 proteins Ma and Mb displayed a plasmodesma subcellular localization, and the MP of CiVA locates in cell periphery and can interact with the viral NP in bimolecular fluorescence complementation assays. RT-PCR tests disclosed that ARWV-2 widely occurs, while CiVA has a low incidence in pear trees grown in China. This study presents the first complete genome sequences and incidences of ARWV-2 and CiVA from pear trees and the obtained results extend our knowledge of the viral pathogens of pear grown in China.
Collapse
|
19
|
Aghamirza Moghim Aliabadi H, Eivazzadeh‐Keihan R, Beig Parikhani A, Fattahi Mehraban S, Maleki A, Fereshteh S, Bazaz M, Zolriasatein A, Bozorgnia B, Rahmati S, Saberi F, Yousefi Najafabadi Z, Damough S, Mohseni S, Salehzadeh H, Khakyzadeh V, Madanchi H, Kardar GA, Zarrintaj P, Saeb MR, Mozafari M. COVID-19: A systematic review and update on prevention, diagnosis, and treatment. MedComm (Beijing) 2022; 3:e115. [PMID: 35281790 PMCID: PMC8906461 DOI: 10.1002/mco2.115] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 12/18/2021] [Accepted: 12/19/2021] [Indexed: 01/09/2023] Open
Abstract
Since the rapid onset of the COVID-19 or SARS-CoV-2 pandemic in the world in 2019, extensive studies have been conducted to unveil the behavior and emission pattern of the virus in order to determine the best ways to diagnosis of virus and thereof formulate effective drugs or vaccines to combat the disease. The emergence of novel diagnostic and therapeutic techniques considering the multiplicity of reports from one side and contradictions in assessments from the other side necessitates instantaneous updates on the progress of clinical investigations. There is also growing public anxiety from time to time mutation of COVID-19, as reflected in considerable mortality and transmission, respectively, from delta and Omicron variants. We comprehensively review and summarize different aspects of prevention, diagnosis, and treatment of COVID-19. First, biological characteristics of COVID-19 were explained from diagnosis standpoint. Thereafter, the preclinical animal models of COVID-19 were discussed to frame the symptoms and clinical effects of COVID-19 from patient to patient with treatment strategies and in-silico/computational biology. Finally, the opportunities and challenges of nanoscience/nanotechnology in identification, diagnosis, and treatment of COVID-19 were discussed. This review covers almost all SARS-CoV-2-related topics extensively to deepen the understanding of the latest achievements (last updated on January 11, 2022).
Collapse
Affiliation(s)
- Hooman Aghamirza Moghim Aliabadi
- Protein Chemistry LaboratoryDepartment of Medical BiotechnologyBiotechnology Research CenterPasteur Institute of IranTehranIran
- Advance Chemical Studies LaboratoryFaculty of ChemistryK. N. Toosi UniversityTehranIran
| | | | - Arezoo Beig Parikhani
- Department of Medical BiotechnologyBiotechnology Research CenterPasteur InstituteTehranIran
| | | | - Ali Maleki
- Department of ChemistryIran University of Science and TechnologyTehranIran
| | | | - Masoume Bazaz
- Department of Medical BiotechnologyBiotechnology Research CenterPasteur InstituteTehranIran
| | | | | | - Saman Rahmati
- Department of Medical BiotechnologyBiotechnology Research CenterPasteur InstituteTehranIran
| | - Fatemeh Saberi
- Department of Medical BiotechnologySchool of Advanced Technologies in MedicineShahid Beheshti University of Medical SciencesTehranIran
| | - Zeinab Yousefi Najafabadi
- Department of Medical BiotechnologySchool of Advanced Technologies in MedicineTehran University of Medical SciencesTehranIran
- ImmunologyAsthma & Allergy Research InstituteTehran University of Medical SciencesTehranIran
| | - Shadi Damough
- Department of Medical BiotechnologyBiotechnology Research CenterPasteur InstituteTehranIran
| | - Sara Mohseni
- Non‐metallic Materials Research GroupNiroo Research InstituteTehranIran
| | | | - Vahid Khakyzadeh
- Department of ChemistryK. N. Toosi University of TechnologyTehranIran
| | - Hamid Madanchi
- School of MedicineSemnan University of Medical SciencesSemnanIran
- Drug Design and Bioinformatics UnitDepartment of Medical BiotechnologyBiotechnology Research CenterPasteur Institute of IranTehranIran
| | - Gholam Ali Kardar
- Department of Medical BiotechnologySchool of Advanced Technologies in MedicineTehran University of Medical SciencesTehranIran
- ImmunologyAsthma & Allergy Research InstituteTehran University of Medical SciencesTehranIran
| | - Payam Zarrintaj
- School of Chemical EngineeringOklahoma State UniversityStillwaterOklahomaUSA
| | - Mohammad Reza Saeb
- Department of Polymer TechnologyFaculty of ChemistryGdańsk University of TechnologyGdańskPoland
| | - Masoud Mozafari
- Department of Tissue Engineering & Regenerative MedicineIran University of Medical SciencesTehranIran
| |
Collapse
|
20
|
Silva JM, Pratas D, Caetano T, Matos S. Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1007/978-3-031-04881-4_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
21
|
Song S, Ma L, Xu X, Shi H, Li X, Liu Y, Hao P. Rapid screening and identification of viral pathogens in metagenomic data. BMC Med Genomics 2021; 14:289. [PMID: 34903237 PMCID: PMC8668262 DOI: 10.1186/s12920-021-01138-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 11/16/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Virus screening and viral genome reconstruction are urgent and crucial for the rapid identification of viral pathogens, i.e., tracing the source and understanding the pathogenesis when a viral outbreak occurs. Next-generation sequencing (NGS) provides an efficient and unbiased way to identify viral pathogens in host-associated and environmental samples without prior knowledge. Despite the availability of software, data analysis still requires human operations. A mature pipeline is urgently needed when thousands of viral pathogen and viral genome reconstruction samples need to be rapidly identified. RESULTS In this paper, we present a rapid and accurate workflow to screen metagenomics sequencing data for viral pathogens and other compositions, as well as enable a reference-based assembler to reconstruct viral genomes. Moreover, we tested our workflow on several metagenomics datasets, including a SARS-CoV-2 patient sample with NGS data, pangolins tissues with NGS data, Middle East Respiratory Syndrome (MERS)-infected cells with NGS data, etc. Our workflow demonstrated high accuracy and efficiency when identifying target viruses from large scale NGS metagenomics data. Our workflow was flexible when working with a broad range of NGS datasets from small (kb) to large (100 Gb). This took from a few minutes to a few hours to complete each task. At the same time, our workflow automatically generates reports that incorporate visualized feedback (e.g., metagenomics data quality statistics, host and viral sequence compositions, details about each of the identified viral pathogens and their coverages, and reassembled viral pathogen sequences based on their closest references). CONCLUSIONS Overall, our system enabled the rapid screening and identification of viral pathogens from metagenomics data, providing an important piece to support viral pathogen research during a pandemic. The visualized report contains information from raw sequence quality to a reconstructed viral sequence, which allows non-professional people to screen their samples for viruses by themselves (Additional file 1).
Collapse
Affiliation(s)
- Shiyang Song
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Liangxiao Ma
- Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, 20031, China
| | - Xintian Xu
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Han Shi
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Xuan Li
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Yuanhua Liu
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Pei Hao
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
22
|
Di Pasquale A, Radomski N, Mangone I, Calistri P, Lorusso A, Cammà C. SARS-CoV-2 surveillance in Italy through phylogenomic inferences based on Hamming distances derived from pan-SNPs, -MNPs and -InDels. BMC Genomics 2021; 22:782. [PMID: 34717546 PMCID: PMC8556844 DOI: 10.1186/s12864-021-08112-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 10/20/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Faced with the ongoing global pandemic of coronavirus disease, the 'National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis' (GENPAT) formally established at the 'Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise' (IZSAM) in Teramo (Italy) is in charge of the SARS-CoV-2 surveillance at the genomic scale. In a context of SARS-CoV-2 surveillance requiring correct and fast assessment of epidemiological clusters from substantial amount of samples, the present study proposes an analytical workflow for identifying accurately the PANGO lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model. RESULTS GENPAT constituted two collections of SARS-CoV-2 samples. The first collection consisted of SARS-CoV-2 positive swabs collected by IZSAM from the Abruzzo region (Italy), then sequenced by next generation sequencing (NGS) and analyzed in GENPAT (n = 1592), while the second collection included samples from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) (n = 17,201). The main results of the present work showed that (i) GENPAT and GISAID detected the same PANGO lineages, (ii) the PANGO lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. 'UK variant') are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGO lineages together, (iv) with a higher dicriminatory power than PANGO lineages, (v) and faster that the usual phylogenomic methods based on MSA and substitution model. CONCLUSIONS The genome sequencing efforts of Italian provinces, combined with a structured national system of NGS data management, provided support for surveillance SARS-CoV-2 in Italy. We propose to build phylogenomic trees of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method avoiding the typical time consuming steps related to MSA and substitution model-based phylogenomic inference.
Collapse
Affiliation(s)
- Adriano Di Pasquale
- National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), via Campo Boario, 64100 Teramo, TE Italy
| | - Nicolas Radomski
- National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), via Campo Boario, 64100 Teramo, TE Italy
| | - Iolanda Mangone
- National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), via Campo Boario, 64100 Teramo, TE Italy
| | - Paolo Calistri
- National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), via Campo Boario, 64100 Teramo, TE Italy
| | - Alessio Lorusso
- National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), via Campo Boario, 64100 Teramo, TE Italy
| | - Cesare Cammà
- National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data-base and bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “Giuseppe Caporale” (IZSAM), via Campo Boario, 64100 Teramo, TE Italy
| |
Collapse
|
23
|
Zhu Z, Zhang S, Wang P, Chen X, Bi J, Cheng L, Zhang X. A comprehensive review of the analysis and integration of omics data for SARS-CoV-2 and COVID-19. Brief Bioinform 2021; 23:6412396. [PMID: 34718395 PMCID: PMC8574485 DOI: 10.1093/bib/bbab446] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/06/2021] [Accepted: 09/28/2021] [Indexed: 12/14/2022] Open
Abstract
Since the first report of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, over 100 million people have been infected by COVID-19, millions of whom have died. In the latest year, a large number of omics data have sprung up and helped researchers broadly study the sequence, chemical structure and function of SARS-CoV-2, as well as molecular abnormal mechanisms of COVID-19 patients. Though some successes have been achieved in these areas, it is necessary to analyze and mine omics data for comprehensively understanding SARS-CoV-2 and COVID-19. Hence, we reviewed the current advantages and limitations of the integration of omics data herein. Firstly, we sorted out the sequence resources and database resources of SARS-CoV-2, including protein chemical structure, potential drug information and research literature resources. Next, we collected omics data of the COVID-19 hosts, including genomics, transcriptomics, microbiology and potential drug information data. And subsequently, based on the integration of omics data, we summarized the existing data analysis methods and the related research results of COVID-19 multi-omics data in recent years. Finally, we put forward SARS-CoV-2 (COVID-19) multi-omics data integration research direction and gave a case study to mine deeper for the disease mechanisms of COVID-19.
Collapse
Affiliation(s)
- Zijun Zhu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Sainan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Ping Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Xinyu Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Jianxing Bi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China, 150081.,NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, Heilongjiang, China, 150028
| | - Xue Zhang
- NHC and CAMS Key Laboratory of Molecular Probe and Targeted Theranostics, Harbin Medical University, Harbin, Heilongjiang, China, 150028.,McKusick-Zhang Center for Genetic Medicine, Peking Union Medical College, Beijing, China, 100005
| |
Collapse
|
24
|
Adamyan L, Elagin V, Vechorko V, Stepanian A, Dashko A, Doroshenko D, Aznaurova Y, Sorokin M, Suntsova M, Garazha A, Buzdin A. COVID-19 - associated inhibition of energy accumulation pathways in human semen samples. ACTA ACUST UNITED AC 2021; 2:355-364. [PMID: 34377996 PMCID: PMC8339600 DOI: 10.1016/j.xfss.2021.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 11/25/2022]
Abstract
Objective To investigate transcriptional alterations in human semen samples associated with COVID-19 infection. Design Retrospective observational cohort study. Setting City hospital. Patient(s) Ten patients who had recovered from mild COVID-19 infection. Eight of these patients had different sperm abnormalities that were diagnosed before infection. The control group consisted of 5 healthy donors without known abnormalities and no history of COVID-19 infection. Intervention(s) We used RNA sequencing to determine gene expression profiles in all studied biosamples. Original standard bioinformatic instruments were used to analyze activation of intracellular molecular pathways. Main Outcome Measure(s) Routine semen analysis, gene expression levels, and molecular pathway activation levels in semen samples. Result(s) We found statistically significant inhibition of genes associated with energy production pathways in the mitochondria, including genes involved in the electron transfer chain and genes involved in toll-like receptor signaling. All protein-coding genes encoded by the mitochondrial genome were significantly down-regulated in semen samples collected from patients after recovery from COVID-19. Conclusion(s) Our results may provide a molecular basis for the previously observed phenomenon of decreased sperm motility associated with COVID-19 infection. Moreover, the data will be beneficial for the optimization of preconception care for men who have recently recovered from COVID-19 infection.
Collapse
Affiliation(s)
- Leila Adamyan
- A.I. Evdokimov Moscow State University of Medicine and Dentistry, 20b1 Delegatskaya St., Moscow, 127473, Russian Federation
| | - Vladimir Elagin
- A.I. Evdokimov Moscow State University of Medicine and Dentistry, 20b1 Delegatskaya St., Moscow, 127473, Russian Federation.,O.M. Filatov City clinical hospital №15, 23 Veshnjakovskaja St., Moscow, 111539, Russian Federation
| | - Valeriy Vechorko
- O.M. Filatov City clinical hospital №15, 23 Veshnjakovskaja St., Moscow, 111539, Russian Federation
| | - Assia Stepanian
- Academia of Women's Health and Endoscopic Surgery, 755 Mount Vernon Hwy, Atlanta, GA, 30328, USA
| | - Anton Dashko
- O.M. Filatov City clinical hospital №15, 23 Veshnjakovskaja St., Moscow, 111539, Russian Federation
| | - Dmitriy Doroshenko
- O.M. Filatov City clinical hospital №15, 23 Veshnjakovskaja St., Moscow, 111539, Russian Federation
| | - Yana Aznaurova
- A.I. Evdokimov Moscow State University of Medicine and Dentistry, 20b1 Delegatskaya St., Moscow, 127473, Russian Federation
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology (National Research University), 9 Institutskij pereulok, Dolgoprudnyj city, Moscow region, 141700, Russian Federation.,OmicsWay Corp., 340 S Lemon Ave, Walnut, CA, 91789, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya St., Moscow, 119435, Russian Federation
| | - Maria Suntsova
- World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya St., Moscow, 119435, Russian Federation
| | | | - Anton Buzdin
- Moscow Institute of Physics and Technology (National Research University), 9 Institutskij pereulok, Dolgoprudnyj city, Moscow region, 141700, Russian Federation.,OmicsWay Corp., 340 S Lemon Ave, Walnut, CA, 91789, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya St., Moscow, 119435, Russian Federation
| |
Collapse
|
25
|
New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies. Neural Comput Appl 2021; 33:15669-15692. [PMID: 34155424 PMCID: PMC8208613 DOI: 10.1007/s00521-021-06188-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 06/02/2021] [Indexed: 12/13/2022]
Abstract
During the last (15) years, improved omics sequencing technologies have expanded the scale and resolution of various biological applications, generating high-throughput datasets that require carefully chosen software tools to be processed. Therefore, following the sequencing development, bioinformatics researchers have been challenged to implement alignment algorithms for next-generation sequencing reads. However, nowadays selection of aligners based on genome characteristics is poorly studied, so our benchmarking study extended the “state of art” comparing 17 different aligners. The chosen tools were assessed on empirical human DNA- and RNA-Seq data, as well as on simulated datasets in human and mouse, evaluating a set of parameters previously not considered in such kind of benchmarks. As expected, we found that each tool was the best in specific conditions. For Ion Torrent single-end RNA-Seq samples, the most suitable aligners were CLC and BWA-MEM, which reached the best results in terms of efficiency, accuracy, duplication rate, saturation profile and running time. About Illumina paired-end osteomyelitis transcriptomics data, instead, the best performer algorithm, together with the already cited CLC, resulted Novoalign, which excelled in accuracy and saturation analyses. Segemehl and DNASTAR performed the best on both DNA-Seq data, with Segemehl particularly suitable for exome data. In conclusion, our study could guide users in the selection of a suitable aligner based on genome and transcriptome characteristics. However, several other aspects, emerged from our work, should be considered in the evolution of alignment research area, such as the involvement of artificial intelligence to support cloud computing and mapping to multiple genomes.
Collapse
|
26
|
Aury JM, Istace B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genom Bioinform 2021; 3:lqab034. [PMID: 33987534 PMCID: PMC8092372 DOI: 10.1093/nargab/lqab034] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/18/2021] [Accepted: 04/13/2021] [Indexed: 12/11/2022] Open
Abstract
Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.
Collapse
Affiliation(s)
- Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France
| |
Collapse
|
27
|
Abouelkhair MA. Non-SARS-CoV-2 genome sequences identified in clinical samples from COVID-19 infected patients: Evidence for co-infections. PeerJ 2020; 8:e10246. [PMID: 33194423 PMCID: PMC7643552 DOI: 10.7717/peerj.10246] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 10/06/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND In December 2019, an ongoing outbreak of pneumonia caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2/ 2019-nCoV) infection was initially reported in Wuhan, Hubei Province, China. Early in 2020, the World Health Organization (WHO) announced a new name for the 2019-nCoV-caused disease: coronavirus disease 2019 (COVID-19) and declared COVID-19 to be a Public Health Emergency of International Concern (PHEIC). Cellular co-infection is a critical determinant of viral fitness and infection outcomes and plays a crucial role in shaping the host immune response to infections. METHODS In this study, 68 public next-generation sequencing data from SARS-CoV-2 infected patients were retrieved from the NCBI Sequence Read Archive database using SRA-Toolkit. Data screening was performed using an alignment-free method based on k-mer mapping and extension, fastv. Taxonomic classification was performed using Kraken 2 on all reads containing one or more virus sequences other than SARS-CoV-2. RESULTS SARS-CoV-2 was identified in all except three patients. Influenza type A (H7N9) virus, human immunodeficiency virus, rhabdovirus, human metapneumovirus, Human adenovirus, Human herpesvirus 1, coronavirus NL63, parvovirus, simian virus 40, and hepatitis virus genomes sequences were detected in SARS-CoV-2 infected patients. Besides, a very diverse group of bacterial populations were observed in the samples.
Collapse
Affiliation(s)
- Mohamed A. Abouelkhair
- Department of Biomedical and Diagnostic Sciences College of Veterinary Medicine, University of Tennessee, Knoxville, TN, USA
| |
Collapse
|