1
|
Meleshko D, Prjbelski AD, Raiko M, Tomescu AI, Tilgner H, Hajirasouliha I. cloudrnaSPAdes: isoform assembly using bulk barcoded RNA sequencing data. Bioinformatics 2024; 40:btad781. [PMID: 38262343 PMCID: PMC10868327 DOI: 10.1093/bioinformatics/btad781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 12/09/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Recent advancements in long-read RNA sequencing have enabled the examination of full-length isoforms, previously uncaptured by short-read sequencing methods. An alternative powerful method for studying isoforms is through the use of barcoded short-read RNA reads, for which a barcode indicates whether two short-reads arise from the same molecule or not. Such techniques included the 10x Genomics linked-read based SParse Isoform Sequencing (SPIso-seq), as well as Loop-Seq, or Tell-Seq. Some applications, such as novel-isoform discovery, require very high coverage. Obtaining high coverage using long reads can be difficult, making barcoded RNA-seq data a valuable alternative for this task. However, most annotation pipelines are not able to work with a set of short reads instead of a single transcript, also not able to work with coverage gaps within a molecule if any. In order to overcome this challenge, we present an RNA-seq assembler that allows the determination of the expressed isoform per barcode. RESULTS In this article, we present cloudrnaSPAdes, a tool for assembling full-length isoforms from barcoded RNA-seq linked-read data in a reference-free fashion. Evaluating it on simulated and real human data, we found that cloudrnaSPAdes accurately assembles isoforms, even for genes with high isoform diversity. AVAILABILITY AND IMPLEMENTATION cloudrnaSPAdes is a feature release of a SPAdes assembler and version used for this article is available at https://github.com/1dayac/cloudrnaSPAdes-release.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, NY 10021, United States
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, United States
| | - Andrey D Prjbelski
- Department of Computer Science, University of Helsinki, Helsinki 00014, Finland
| | - Mikhail Raiko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St Petersburg State University, St Petersburg 199004, Russia
| | - Alexandru I Tomescu
- Department of Computer Science, University of Helsinki, Helsinki 00014, Finland
| | - Hagen Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY 10021, United States
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY 10021, United States
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, United States
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, United States
| |
Collapse
|
2
|
Mak L, Meleshko D, Danko DC, Barakzai WN, Maharjan S, Belchikov N, Hajirasouliha I. Ariadne: synthetic long read deconvolution using assembly graphs. Genome Biol 2023; 24:197. [PMID: 37641111 PMCID: PMC10463629 DOI: 10.1186/s13059-023-03033-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 08/07/2023] [Indexed: 08/31/2023] Open
Abstract
Synthetic long read sequencing techniques such as UST's TELL-Seq and Loop Genomics' LoopSeq combine 3[Formula: see text] barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3[Formula: see text] unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.
Collapse
Affiliation(s)
- Lauren Mak
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - Dmitry Meleshko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - David C. Danko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | | | - Salil Maharjan
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - Natan Belchikov
- Physiology, Biophysics & Systems Biology Program, Weill Cornell Medicine of Cornell University, New York, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine of Cornell University, New York, USA
| |
Collapse
|
3
|
Meleshko D, Korobeynikov A. Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample. Biology (Basel) 2023; 12:1066. [PMID: 37626951 PMCID: PMC10451528 DOI: 10.3390/biology12081066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/18/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023]
Abstract
A recently published article in BMCGenomics by Fuentes-Trillo et al. contains a comparison of assembly approaches of several noroviral samples via different tools and preprocessing strategies. It turned out that the study used outdated versions of tools as well as tools that were not designed for the viral assembly task. In order to improve the suboptimal assemblies, authors suggested different sophisticated preprocessing strategies that seem to make only minor contributions to the results. We have reproduced the analysis using state-of-the-art tools designed for viral assembly, and we demonstrate that tools from the SPAdes toolkit (rnaviralSPAdes and coronaSPAdes) allow one to assemble the samples from the original study into a single contig without any additional preprocessing.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Center for Algorithmic Biotechnology, St. Petersburg State University, 7/9 Universitetskaya Emb., 199004 St. Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, St. Petersburg State University, 7/9 Universitetskaya Emb., 199004 St. Petersburg, Russia
- Department of Statistical Modelling, St. Petersburg State University, Universitetskiy 28, 198504 St. Petersburg, Russia
| |
Collapse
|
4
|
Meleshko D, Prjbelski AD, Raiko M, Tomescu AI, Tilgner H, Hajirasouliha I. cloudrnaSPAdes: Isoform assembly using bulk barcoded RNA sequencing data. bioRxiv 2023:2023.07.25.550587. [PMID: 37546844 PMCID: PMC10402000 DOI: 10.1101/2023.07.25.550587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Motivation Recent advancements in long-read RNA sequencing have enabled the examination of full-length isoforms, previously uncaptured by short-read sequencing methods. An alternative powerful method for studying isoforms is through the use of barcoded short-read RNA reads, for which a barcode indicates whether two short-reads arise from the same molecule or not. Such techniques included the 10x Genomics linked-read based SParse Isoform Sequencing (SPIso-seq), as well as Loop-Seq, or Tell-Seq. Some applications, such as novel-isoform discovery, require very high coverage. Obtaining high coverage using long reads can be difficult, making barcoded RNA-seq data a valuable alternative for this task. However, most annotation pipelines are not able to work with a set of short reads instead of a single transcript, also not able to work with coverage gaps within a molecule if any. In order to overcome this challenge, we present an RNA-seq assembler allowing the determination of the expressed isoform per barcode. Results In this paper, we present cloudrnaSPAdes, a tool for assembling full-length isoforms from barcoded RNA-seq linked-read data in a reference-free fashion. Evaluating it on simulated and real human data, we found that cloudrnaSPAdes accurately assembles isoforms, even for genes with high isoform diversity. Availability cloudrnaSPAdes is a feature release of a SPAdes assembler and available at https://cab.spbu.ru/software/cloudrnaspades/.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, NY, 10021, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine, NY, 10021, USA
| | | | - Mikhail Raiko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia, 199004
| | | | - Hagen Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, 10021, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine, NY, 10021, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, NY, 10021, USA
| |
Collapse
|
5
|
Popic V, Rohlicek C, Cunial F, Hajirasouliha I, Meleshko D, Garimella K, Maheshwari A. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat Methods 2023; 20:559-568. [PMID: 36959322 PMCID: PMC10152467 DOI: 10.1038/s41592-023-01799-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/29/2023] [Indexed: 03/25/2023]
Abstract
Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.
Collapse
Affiliation(s)
| | | | - Fabio Cunial
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Dmitry Meleshko
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- Tri-Institutional Computational Biology and Medicine Program, Weill Cornell Medicine, New York, NY, USA
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | |
Collapse
|
6
|
Meleshko D, Yang R, Marks P, Williams S, Hajirasouliha I. Efficient detection and assembly of non-reference DNA sequences with synthetic long reads. Nucleic Acids Res 2022; 50:e108. [PMID: 35924489 PMCID: PMC9561269 DOI: 10.1093/nar/gkac653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/10/2022] [Accepted: 08/01/2022] [Indexed: 11/14/2022] Open
Abstract
Recent pan-genome studies have revealed an abundance of DNA sequences in human genomes that are not present in the reference genome. A lion's share of these non-reference sequences (NRSs) cannot be reliably assembled or placed on the reference genome. Improvements in long-read and synthetic long-read (aka linked-read) technologies have great potential for the characterization of NRSs. While synthetic long reads require less input DNA than long-read datasets, they are algorithmically more challenging to use. Except for computationally expensive whole-genome assembly methods, there is no synthetic long-read method for NRS detection. We propose a novel integrated alignment-based and local assembly-based algorithm, Novel-X, that uses the barcode information encoded in synthetic long reads to improve the detection of such events without a whole-genome de novo assembly. Our evaluations demonstrate that Novel-X finds many non-reference sequences that cannot be found by state-of-the-art short-read methods. We applied Novel-X to a diverse set of 68 samples from the Polaris HiSeq 4000 PGx cohort. Novel-X discovered 16 691 NRS insertions of size > 300 bp (total length 18.2 Mb). Many of them are population specific or may have a functional impact.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Rui Yang
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, NY 10021, USA.,Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA
| | - Patrick Marks
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Stephen Williams
- 10x Genomics Inc., Stoneridge Mall Road, Pleasanton, CA 94566, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, NY 10021, USA.,Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, NY 10021, USA
| |
Collapse
|
7
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | | | - Till Robin Lesker
- German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany.,Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | | | | | | | | | - Jan Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University, Düsseldorf, Germany
| | - Aydin Buluç
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Bo Chen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | | | - Philip T L C Clausen
- National Food Institute, Division of Global Surveillance, Technical University of Denmark, Lyngby, Denmark
| | - Alexandru Cristian
- Drexel University, Philadelphia, PA, USA.,Google Inc., Philadelphia, PA, USA
| | - Piotr Wojciech Dabrowski
- Robert Koch-Institut, Berlin, Germany.,Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany
| | | | - Rob Egan
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Eleazar Eskin
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Melissa A Gray
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA
| | - Lars Hestbjerg Hansen
- University of Copenhagen, Department of Plant and Environmental Science, Frederiksberg, Denmark
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Pingqin Huang
- School of Computer Science, Fudan University, Shanghai, China
| | - Luiz Irber
- University of California, Davis, Davis, CA, USA
| | - Huijue Jia
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | - Tue Sparholt Jørgensen
- Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Lyngby, Denmark.,Aarhus University, Department of Environmental Science, Roskilde, Denmark
| | - Silas D Kieser
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Axel Kola
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| | - Jason Kwan
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chenhao Li
- Genome Institute of Singapore, Singapore, Singapore
| | | | - Fabio Malcher-Miranda
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Vanessa R Marcelino
- Sydney Medical School, The University of Sydney, Sydney, Australia.,Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, Australia
| | | | - Pierre Marijon
- Department of Computer Science, Inria, University of Lille, CNRS, Lille, France
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daniel R Mende
- Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland.,Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Niranjan Nagarajan
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National University of Singapore, Singapore, Singapore
| | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Vitor C Piro
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Evan R Rees
- University of Wisconsin-Madison, Madison, WI, USA
| | - Knut Reinert
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Bernhard Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
| | | | - Gail L Rosen
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA.,Center for Biological Discovery from Big Data, Philadelphia, PA, USA
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Varuni Sarwal
- University of California, Los Angeles, Los Angeles, CA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Enrico Seiler
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Lizhen Shi
- Florida Polytechnic University, Lakeland, FL, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Ashleigh Thomas
- DOE Joint Genome Institute, Berkeley, CA, USA.,University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mirko Trajkovski
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Diabetes Center, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Tremblay
- Energy, Mining and Environment, National Research Council Canada, Montreal, Quebec, Canada
| | | | | | - Zhengyang Wang
- School of Computer Science, Fudan University, Shanghai, China
| | - Ziye Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| | | | | | - Katherine Yelick
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Ronghui You
- School of Computer Science, Fudan University, Shanghai, China
| | - Georg Zeller
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | | | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Jie Zhu
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Susanne Häußler
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ariane Khaledi
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Fantin Mesny
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| | | | | | - Nathiana Smit
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Strowig
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany. .,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. .,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany. .,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany.
| |
Collapse
|
8
|
Edgar RC, Taylor B, Lin V, Altman T, Barbera P, Meleshko D, Lohr D, Novakovsky G, Buchfink B, Al-Shayeb B, Banfield JF, de la Peña M, Korobeynikov A, Chikhi R, Babaian A. Petabase-scale sequence alignment catalyses viral discovery. Nature 2022; 602:142-147. [PMID: 35082445 DOI: 10.1038/s41586-021-04332-2] [Citation(s) in RCA: 138] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/10/2021] [Indexed: 01/20/2023]
Abstract
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
Collapse
Affiliation(s)
| | - Brie Taylor
- Independent researcher, Vancouver, British Columbia, Canada
| | - Victor Lin
- Independent researcher, Seattle, WA, USA
| | | | - Pierre Barbera
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | - Gherman Novakovsky
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benjamin Buchfink
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany
| | - Basem Al-Shayeb
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, CA, USA
| | - Marcos de la Peña
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia
- Department of Statistical Modelling, St Petersburg State University, St Petersburg, Russia
| | - Rayan Chikhi
- G5 Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Artem Babaian
- Independent researcher, Vancouver, British Columbia, Canada.
| |
Collapse
|
9
|
Meleshko D, Hajirasouliha I, Korobeynikov A. coronaSPAdes: from biosynthetic gene clusters to RNA viral assemblies. Bioinformatics 2021; 38:1-8. [PMID: 34406356 DOI: 10.1093/bioinformatics/btab597] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 07/20/2021] [Accepted: 08/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The COVID-19 pandemic has ignited a broad scientific interest in viral research in general and coronavirus research in particular. The identification and characterization of viral species in natural reservoirs typically involves de novo assembly. However, existing genome, metagenome and transcriptome assemblers often are not able to assemble many viruses (including coronaviruses) into a single contig. Coverage variation between datasets and within dataset, presence of close strains, splice variants and contamination set a high bar for assemblers to process viral datasets with diverse properties. RESULTS We developed coronaSPAdes, a novel assembler for RNA viral species recovery in general and coronaviruses in particular. coronaSPAdes leverages the knowledge about viral genome structures to improve assembly extending ideas initially implemented in biosyntheticSPAdes. We have shown that coronaSPAdes outperforms existing SPAdes modes and other popular short-read metagenome and viral assemblers in the recovery of full-length RNA viral genomes. AVAILABILITY AND IMPLEMENTATION coronaSPAdes version used in this article is a part of SPAdes 3.15 release and is freely available at http://cab.spbu.ru/software/spades. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY 10021, USA.,Center for Algorithmic Biotechnology, St. Petersburg State University, St. Peterburg 199004, Russia.,Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, New York, NY 10021, USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine of Cornell University, New York, NY 10021, USA.,Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Peterburg 199004, Russia.,Department of Statistical Modelling, St. Petersburg State University, St. Peterburg 198504, Russia
| |
Collapse
|
10
|
Khosravi P, Lysandrou M, Eljalby M, Li Q, Kazemi E, Zisimopoulos P, Sigaras A, Brendel M, Barnes J, Ricketts C, Meleshko D, Yat A, McClure TD, Robinson BD, Sboner A, Elemento O, Chughtai B, Hajirasouliha I. A Deep Learning Approach to Diagnostic Classification of Prostate Cancer Using Pathology-Radiology Fusion. J Magn Reson Imaging 2021; 54:462-471. [PMID: 33719168 PMCID: PMC8360022 DOI: 10.1002/jmri.27599] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND A definitive diagnosis of prostate cancer requires a biopsy to obtain tissue for pathologic analysis, but this is an invasive procedure and is associated with complications. PURPOSE To develop an artificial intelligence (AI)-based model (named AI-biopsy) for the early diagnosis of prostate cancer using magnetic resonance (MR) images labeled with histopathology information. STUDY TYPE Retrospective. POPULATION Magnetic resonance imaging (MRI) data sets from 400 patients with suspected prostate cancer and with histological data (228 acquired in-house and 172 from external publicly available databases). FIELD STRENGTH/SEQUENCE 1.5 to 3.0 Tesla, T2-weighted image pulse sequences. ASSESSMENT MR images reviewed and selected by two radiologists (with 6 and 17 years of experience). The patient images were labeled with prostate biopsy including Gleason Score (6 to 10) or Grade Group (1 to 5) and reviewed by one pathologist (with 15 years of experience). Deep learning models were developed to distinguish 1) benign from cancerous tumor and 2) high-risk tumor from low-risk tumor. STATISTICAL TESTS To evaluate our models, we calculated negative predictive value, positive predictive value, specificity, sensitivity, and accuracy. We also calculated areas under the receiver operating characteristic (ROC) curves (AUCs) and Cohen's kappa. RESULTS Our computational method (https://github.com/ih-lab/AI-biopsy) achieved AUCs of 0.89 (95% confidence interval [CI]: [0.86-0.92]) and 0.78 (95% CI: [0.74-0.82]) to classify cancer vs. benign and high- vs. low-risk of prostate disease, respectively. DATA CONCLUSION AI-biopsy provided a data-driven and reproducible way to assess cancer risk from MR images and a personalized strategy to potentially reduce the number of unnecessary biopsies. AI-biopsy highlighted the regions of MR images that contained the predictive features the algorithm used for diagnosis using the class activation map method. It is a fully automatic method with a drag-and-drop web interface (https://ai-biopsy.eipm-research.org) that allows radiologists to review AI-assessed MR images in real time. LEVEL OF EVIDENCE 1 TECHNICAL EFFICACY STAGE: 2.
Collapse
Affiliation(s)
- Pegah Khosravi
- Computational Oncology, Department of Epidemiology and BiostatisticsMemorial Sloan Kettering Cancer CenterNew YorkNew YorkUSA
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| | - Maria Lysandrou
- Neuroscience InstituteThe University of ChicagoChicagoIllinoisUSA
| | - Mahmoud Eljalby
- Department of UrologyWeill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
| | - Qianzi Li
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Mathematics and Statistics DepartmentCarleton CollegeNorthfieldMinnesotaUSA
| | - Ehsan Kazemi
- Yale University, Department of Electrical Engineering
| | - Pantelis Zisimopoulos
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| | - Alexandros Sigaras
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| | - Matthew Brendel
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
| | - Josue Barnes
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| | - Camir Ricketts
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| | - Dmitry Meleshko
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| | - Andy Yat
- Department of RadiologyNew York‐Presbyterian HospitalNew YorkNew YorkUSA
| | - Timothy D. McClure
- Department of UrologyWeill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
| | - Brian D. Robinson
- Department of PathologyNew York Presbyterian Hospital‐Weill Cornell Medical CollegeNew YorkNew YorkUSA
| | - Andrea Sboner
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
- Department of PathologyNew York Presbyterian Hospital‐Weill Cornell Medical CollegeNew YorkNew YorkUSA
| | - Olivier Elemento
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
- WorldQuant Initiative for Quantitative PredictionWeill Cornell MedicineNew YorkNew YorkUSA
| | - Bilal Chughtai
- Department of UrologyWeill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
| | - Iman Hajirasouliha
- Department of Physiology and BiophysicsInstitute for Computational Biomedicine, Weill Cornell Medicine of Cornell UniversityNew YorkNew YorkUSA
- Caryl and Israel Englander Institute for Precision MedicineThe Meyer Cancer Center, Weill Cornell MedicineNew YorkNew YorkUSA
| |
Collapse
|
11
|
Abstract
SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.
Collapse
Affiliation(s)
- Andrey Prjibelski
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Dmitry Antipov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Alla Lapidus
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Cytology and Histology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| |
Collapse
|
12
|
Danko D, Bezdan D, Afshin EE, Ahsanuddin S, Bhattacharya C, Butler DJ, Chng KR, Donnellan D, Hecht J, Jackson K, Kuchin K, Karasikov M, Lyons A, Mak L, Meleshko D, Mustafa H, Mutai B, Neches RY, Ng A, Nikolayeva O, Nikolayeva T, Png E, Ryon KA, Sanchez JL, Shaaban H, Sierra MA, Thomas D, Young B, Abudayyeh OO, Alicea J, Bhattacharyya M, Blekhman R, Castro-Nallar E, Cañas AM, Chatziefthimiou AD, Crawford RW, De Filippis F, Deng Y, Desnues C, Dias-Neto E, Dybwad M, Elhaik E, Ercolini D, Frolova A, Gankin D, Gootenberg JS, Graf AB, Green DC, Hajirasouliha I, Hastings JJA, Hernandez M, Iraola G, Jang S, Kahles A, Kelly FJ, Knights K, Kyrpides NC, Łabaj PP, Lee PKH, Leung MHY, Ljungdahl PO, Mason-Buck G, McGrath K, Meydan C, Mongodin EF, Moraes MO, Nagarajan N, Nieto-Caballero M, Noushmehr H, Oliveira M, Ossowski S, Osuolale OO, Özcan O, Paez-Espino D, Rascovan N, Richard H, Rätsch G, Schriml LM, Semmler T, Sezerman OU, Shi L, Shi T, Siam R, Song LH, Suzuki H, Court DS, Tighe SW, Tong X, Udekwu KI, Ugalde JA, Valentine B, Vassilev DI, Vayndorf EM, Velavan TP, Wu J, Zambrano MM, Zhu J, Zhu S, Mason CE. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell 2021; 184:3376-3393.e17. [PMID: 34043940 PMCID: PMC8238498 DOI: 10.1016/j.cell.2021.05.002] [Citation(s) in RCA: 129] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 03/05/2021] [Accepted: 04/29/2021] [Indexed: 01/14/2023]
Abstract
We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.
Collapse
Affiliation(s)
- David Danko
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Daniela Bezdan
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA; Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany; NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, Germany
| | - Evan E Afshin
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | | | - Chandrima Bhattacharya
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Daniel J Butler
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Kern Rei Chng
- Genome Institute of Singapore, A(∗)STAR, Singapore, Singapore
| | - Daisy Donnellan
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Jochen Hecht
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Katelyn Jackson
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Katerina Kuchin
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Mikhail Karasikov
- ETH Zurich, Department of Computer Science, Biomedical Informatics Group, Zurich, Switzerland; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Abigail Lyons
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Lauren Mak
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Dmitry Meleshko
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Harun Mustafa
- ETH Zurich, Department of Computer Science, Biomedical Informatics Group, Zurich, Switzerland; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Beth Mutai
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Kenya Medical Research Institute - Kisumu, Kisumu, Kenya
| | - Russell Y Neches
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Amanda Ng
- Genome Institute of Singapore, A(∗)STAR, Singapore, Singapore
| | | | | | - Eileen Png
- Genome Institute of Singapore, A(∗)STAR, Singapore, Singapore
| | - Krista A Ryon
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Jorge L Sanchez
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Heba Shaaban
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Maria A Sierra
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Dominique Thomas
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Ben Young
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Omar O Abudayyeh
- Massachusetts Institute of Technology, McGovern Institute for Brain Research, Cambridge, MA, USA
| | - Josue Alicea
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Malay Bhattacharyya
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India; Centre for Artificial Intelligence and Machine Learning, Indian Statistical Institute, Kolkata, India
| | | | - Eduardo Castro-Nallar
- Universidad Andres Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Santiago, Chile
| | - Ana M Cañas
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Aspassia D Chatziefthimiou
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | | | - Francesca De Filippis
- Department of Agricultural Sciences, Division of Microbiology, University of Naples Federico II, Naples, Italy; Task Force on Microbiome Studies, University of Naples Federico II, Naples, Italy
| | - Youping Deng
- University of Hawaii John A. Burns School of Medicine, Honolulu, HI, USA
| | - Christelle Desnues
- Aix-Marseille Université, Mediterranean Institute of Oceanology, Université de Toulon, CNRS, IRD, UM 110, Marseille, France
| | - Emmanuel Dias-Neto
- Medical Genomics group, A.C.Camargo Cancer Center, São Paulo - SP, Brazil
| | - Marius Dybwad
- Norwegian Defence Research Establishment FFI, Kjeller, Norway
| | - Eran Elhaik
- Department of Biology, Lund University, Lund, Sweden
| | - Danilo Ercolini
- Department of Agricultural Sciences, Division of Microbiology, University of Naples Federico II, Naples, Italy; Task Force on Microbiome Studies, University of Naples Federico II, Naples, Italy
| | - Alina Frolova
- Institute of Molecular Biology and Genetics of National Academy of Sciences of Ukraine, Kyiv, Ukraine; Kyiv Academic University, Kyiv, Ukraine
| | - Dennis Gankin
- Massachusetts Institute of Technology, McGovern Institute for Brain Research, Cambridge, MA, USA
| | - Jonathan S Gootenberg
- Massachusetts Institute of Technology, McGovern Institute for Brain Research, Cambridge, MA, USA
| | | | - David C Green
- Department of Analytical, Environmental and Forensic Sciences, King's College London, London, UK
| | - Iman Hajirasouliha
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Jaden J A Hastings
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | | | - Gregorio Iraola
- Microbial Genomics Laboratory, Institut Pasteur de Montevideo, Montevideo, Uruguay; Center for Integrative Biology, Universidad Mayor, Santiago de Chile, Santiago, Chile; Wellcome Sanger Institute, Hinxton, UK
| | | | - Andre Kahles
- ETH Zurich, Department of Computer Science, Biomedical Informatics Group, Zurich, Switzerland; Kyiv Academic University, Kyiv, Ukraine; C+, Research Center in Technologies for Society, School of Engineering, Universidad del Desarrollo, Santiago, Chile
| | - Frank J Kelly
- Department of Analytical, Environmental and Forensic Sciences, King's College London, London, UK
| | - Kaymisha Knights
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Paweł P Łabaj
- State Key Laboratory of Genetic Engineering (SKLGE) and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China; Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland; Boku University Viennna, Vienna, Austria
| | - Patrick K H Lee
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
| | - Marcus H Y Leung
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
| | - Per O Ljungdahl
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden
| | - Gabriella Mason-Buck
- Department of Analytical, Environmental and Forensic Sciences, King's College London, London, UK
| | - Ken McGrath
- Microba, 388 Queen St, Brisbane City, QLD 4000, Australia
| | - Cem Meydan
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Emmanuel F Mongodin
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | | | | | | | - Houtan Noushmehr
- University of São Paulo, Ribeirão Preto Medical School, Ribeirão Preto - SP, Brazil
| | - Manuela Oliveira
- Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal
| | - Stephan Ossowski
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany; NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, Germany
| | - Olayinka O Osuolale
- Applied Environmental Metagenomics and Infectious Diseases Research (AEMIDR), Department of Biological Sciences, Elizade University, Ilara-Mokin, Nigeria
| | - Orhan Özcan
- Acibadem Mehmet Ali Aydınlar University, Istanbul, Turkey
| | - David Paez-Espino
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Nicolás Rascovan
- Microbial Paleogenomics Unit, Institut Pasteur, CNRS UMR2000, Paris 75015, France
| | - Hugues Richard
- Sorbonne University, Faculty of Science, Institute of Biology Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France; Robert Koch Institute, Berlin, Germany
| | - Gunnar Rätsch
- ETH Zurich, Department of Computer Science, Biomedical Informatics Group, Zurich, Switzerland; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | | | | | - Leming Shi
- Center for Pharmacogenomics, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China; State Key Laboratory of Genetic Engineering (SKLGE) and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Rania Siam
- University of Medicine and Health Sciences, St. Kitts, West Indies and American University in Cairo, Cairo, Egypt
| | - Le Huu Song
- 108 Military Central Hospital, Hanoi, Vietnam; Vietnamese-German Center for Medical Research (VG-CARE), Hanoi, Vietnam
| | | | - Denise Syndercombe Court
- Department of Analytical, Environmental and Forensic Sciences, King's College London, London, UK
| | | | - Xinzhao Tong
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
| | - Klas I Udekwu
- Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm, Sweden; SciLife EVP, Department of Aquatic Sciences Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Juan A Ugalde
- Millennium Initiative for Collaborative Research on Bacterial Resistance, Santiago, Chile; C+, Research Center in Technologies for Society, School of Engineering, Universidad del Desarrollo, Santiago, Chile
| | - Brandon Valentine
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Dimitar I Vassilev
- Faculty of Mathematics and Informatics, Sofia University "St. Kliment Ohridski," Sofia, Bulgaria
| | - Elena M Vayndorf
- Institute of Arctic Biology, University of Alaska, Fairbanks, Fairbanks, AK, USA
| | - Thirumalaisamy P Velavan
- Institute of Tropical Medicine, Univeristätsklinikum Tübingen, Tübingen, Germany; Faculty of Medicine, Duy Tan University, Da Nang, Vietnam
| | - Jun Wu
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | | | - Jifeng Zhu
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering (SKLGE) and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China; Department of Epidemiology, School of Public Health, Fudan University, Shanghai, China
| | - Christopher E Mason
- Weill Cornell Medicine, New York, NY, USA; The Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, USA; The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
13
|
Danko DC, Sierra MA, Benardini JN, Guan L, Wood JM, Singh N, Seuylemezian A, Butler DJ, Ryon K, Kuchin K, Meleshko D, Bhattacharya C, Venkateswaran KJ, Mason CE. A comprehensive metagenomics framework to characterize organisms relevant for planetary protection. Microbiome 2021; 9:82. [PMID: 33795001 PMCID: PMC8016160 DOI: 10.1186/s40168-021-01020-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 02/02/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND Clean rooms of the Space Assembly Facility (SAF) at the Jet Propulsion Laboratory (JPL) at NASA are the final step of spacecraft cleaning and assembly before launching into space. Clean rooms have stringent methods of air-filtration and cleaning to minimize microbial contamination for exoplanetary research and minimize the risk of human pathogens, but they are not sterile. Clean rooms make a selective environment for microorganisms that tolerate such cleaning methods. Previous studies have attempted to characterize the microbial cargo through sequencing and culture-dependent protocols. However, there is not a standardized metagenomic workflow nor analysis pipeline for spaceflight hardware cleanroom samples to identify microbial contamination. Additionally, current identification methods fail to characterize and profile the risk of low-abundance microorganisms. RESULTS A comprehensive metagenomic framework to characterize microorganisms relevant for planetary protection in multiple cleanroom classifications (from ISO-5 to ISO-8.5) and sample types (surface, filters, and debris collected via vacuum devices) was developed. Fifty-one metagenomic samples from SAF clean rooms were sequenced and analyzed to identify microbes that could potentially survive spaceflight based on their microbial features and whether the microbes expressed any metabolic activity or growth. Additionally, an auxiliary testing was performed to determine the repeatability of our techniques and validate our analyses. We find evidence that JPL clean rooms carry microbes with attributes that may be problematic in space missions for their documented ability to withstand extreme conditions, such as psychrophilia and ability to form biofilms, spore-forming capacity, radiation resistance, and desiccation resistance. Samples from ISO-5 standard had lower microbial diversity than those conforming to ISO-6 or higher filters but still carried a measurable microbial load. CONCLUSIONS Although the extensive cleaning processes limit the number of microbes capable of withstanding clean room condition, it is important to quantify thresholds and detect organisms that can inform ongoing Planetary Protection goals, provide a biological baseline for assembly facilities, and guide future mission planning. Video Abstract.
Collapse
Affiliation(s)
- David C Danko
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA
| | - Maria A Sierra
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - James N Benardini
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA, 91109, USA
| | - Lisa Guan
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA, 91109, USA
| | - Jason M Wood
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA, 91109, USA
| | - Nitin Singh
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA, 91109, USA
| | - Arman Seuylemezian
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA, 91109, USA
| | - Daniel J Butler
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Krista Ryon
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Katerina Kuchin
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Dmitry Meleshko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Chandrima Bhattacharya
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Kasthuri J Venkateswaran
- Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA, 91109, USA.
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10065, USA.
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
14
|
Butler D, Mozsary C, Meydan C, Foox J, Rosiene J, Shaiber A, Danko D, Afshinnekoo E, MacKay M, Sedlazeck FJ, Ivanov NA, Sierra M, Pohle D, Zietz M, Gisladottir U, Ramlall V, Sholle ET, Schenck EJ, Westover CD, Hassan C, Ryon K, Young B, Bhattacharya C, Ng DL, Granados AC, Santos YA, Servellita V, Federman S, Ruggiero P, Fungtammasan A, Chin CS, Pearson NM, Langhorst BW, Tanner NA, Kim Y, Reeves JW, Hether TD, Warren SE, Bailey M, Gawrys J, Meleshko D, Xu D, Couto-Rodriguez M, Nagy-Szakal D, Barrows J, Wells H, O'Hara NB, Rosenfeld JA, Chen Y, Steel PAD, Shemesh AJ, Xiang J, Thierry-Mieg J, Thierry-Mieg D, Iftner A, Bezdan D, Sanchez E, Campion TR, Sipley J, Cong L, Craney A, Velu P, Melnick AM, Shapira S, Hajirasouliha I, Borczuk A, Iftner T, Salvatore M, Loda M, Westblade LF, Cushing M, Wu S, Levy S, Chiu C, Schwartz RE, Tatonetti N, Rennert H, Imielinski M, Mason CE. Shotgun transcriptome, spatial omics, and isothermal profiling of SARS-CoV-2 infection reveals unique host responses, viral diversification, and drug interactions. Nat Commun 2021; 12:1660. [PMID: 33712587 PMCID: PMC7954844 DOI: 10.1038/s41467-021-21361-7] [Citation(s) in RCA: 92] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 01/25/2021] [Indexed: 02/08/2023] Open
Abstract
In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling. We applied these methods to clinical specimens gathered from 669 patients in New York City during the first two months of the outbreak, yielding a broad molecular portrait of the emerging COVID-19 disease. We find significant enrichment of a NYC-distinctive clade of the virus (20C), as well as host responses in interferon, ACE, hematological, and olfaction pathways. In addition, we use 50,821 patient records to find that renin-angiotensin-aldosterone system inhibitors have a protective effect for severe COVID-19 outcomes, unlike similar drugs. Finally, spatial transcriptomic data from COVID-19 patient autopsy tissues reveal distinct ACE2 expression loci, with macrophage and neutrophil infiltration in the lungs. These findings can inform public health and may help develop and drive SARS-CoV-2 diagnostic, prevention, and treatment strategies.
Collapse
Affiliation(s)
- Daniel Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Christopher Mozsary
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Cem Meydan
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Joel Rosiene
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Alon Shaiber
- New York Genome Center, New York, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - David Danko
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Matthew MacKay
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nikolay A Ivanov
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Clinical & Translational Science Center, Weill Cornell Medicine, New York, NY, USA
| | - Maria Sierra
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Diana Pohle
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Tuebingen, Germany
| | - Michael Zietz
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, Columbia, NY, USA
| | - Undina Gisladottir
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, Columbia, NY, USA
| | - Vijendra Ramlall
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, Columbia, NY, USA
- Department of Cellular, Molecular Physiology & Biophysics, Columbia University, Columbia, NY, USA
| | - Evan T Sholle
- Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, USA
| | - Edward J Schenck
- Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Craig D Westover
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Ciaran Hassan
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Krista Ryon
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Benjamin Young
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | | | - Dianna L Ng
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
| | - Andrea C Granados
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA, USA
| | - Yale A Santos
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA, USA
| | - Venice Servellita
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA, USA
| | - Scot Federman
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA, USA
| | - Phyllis Ruggiero
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | | | | | | | | | | | | | | | | | | | | | - Justyna Gawrys
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Dmitry Meleshko
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA
| | - Dong Xu
- Genomics Resources Core Facility, Weill Cornell Medicine, New York, NY, USA
| | | | - Dorottya Nagy-Szakal
- Biotia, Inc., New York, NY, USA
- Department of Cell Biology, SUNY Downstate Health Sciences University, New York, NY, USA
| | | | | | - Niamh B O'Hara
- Biotia, Inc., New York, NY, USA
- Department of Cell Biology, SUNY Downstate Health Sciences University, New York, NY, USA
| | - Jeffrey A Rosenfeld
- Rutgers Cancer Institute of New Jersey, New York, NJ, USA
- Department of Pathology, Robert Wood Johnson Medical School, New York, NJ, USA
| | - Ying Chen
- Rutgers Cancer Institute of New Jersey, New York, NJ, USA
| | - Peter A D Steel
- Department of Emergency Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Amos J Shemesh
- Department of Emergency Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Jenny Xiang
- Genomics Resources Core Facility, Weill Cornell Medicine, New York, NY, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Angelika Iftner
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Tuebingen, Germany
| | - Daniela Bezdan
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Tuebingen, Germany
| | | | - Thomas R Campion
- Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - John Sipley
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Lin Cong
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Arryn Craney
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Priya Velu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Ari M Melnick
- Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Sagi Shapira
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, Columbia, NY, USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Alain Borczuk
- Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Thomas Iftner
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Tuebingen, Germany
| | - Mirella Salvatore
- Department of Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Lars F Westblade
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
- Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Melissa Cushing
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Shixiu Wu
- Hangzhou Cancer Institute, Hangzhou Cancer Hospital, Hangzhou, China
- Department of Radiation Oncology, Hangzhou Cancer Hospital, Hangzhou, China
| | - Shawn Levy
- HudsonAlpha Discovery Institute, Huntsville, AL, USA
| | - Charles Chiu
- Department of Laboratory Medicine, University of California, San Francisco, CA, USA
- UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, CA, USA
- Department of Medicine, Division of Infectious Diseases, University of California, San Francisco, CA, USA
| | | | - Nicholas Tatonetti
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, Columbia, NY, USA.
| | - Hanna Rennert
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
| | - Marcin Imielinski
- New York Genome Center, New York, NY, USA.
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA.
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
15
|
Butler DJ, Mozsary C, Meydan C, Danko D, Foox J, Rosiene J, Shaiber A, Afshinnekoo E, MacKay M, Sedlazeck FJ, Ivanov NA, Sierra M, Pohle D, Zietz M, Gisladottir U, Ramlall V, Westover CD, Ryon K, Young B, Bhattacharya C, Ruggiero P, Langhorst BW, Tanner N, Gawrys J, Meleshko D, Xu D, Steel PAD, Shemesh AJ, Xiang J, Thierry-Mieg J, Thierry-Mieg D, Schwartz RE, Iftner A, Bezdan D, Sipley J, Cong L, Craney A, Velu P, Melnick AM, Hajirasouliha I, Horner SM, Iftner T, Salvatore M, Loda M, Westblade LF, Cushing M, Levy S, Wu S, Tatonetti N, Imielinski M, Rennert H, Mason CE. Shotgun Transcriptome and Isothermal Profiling of SARS-CoV-2 Infection Reveals Unique Host Responses, Viral Diversification, and Drug Interactions. bioRxiv 2020:2020.04.20.048066. [PMID: 32511352 PMCID: PMC7255793 DOI: 10.1101/2020.04.20.048066] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has caused thousands of deaths worldwide, including >18,000 in New York City (NYC) alone. The sudden emergence of this pandemic has highlighted a pressing clinical need for rapid, scalable diagnostics that can detect infection, interrogate strain evolution, and identify novel patient biomarkers. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs, plus a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, bacterial, and viral profiling. We applied both technologies across 857 SARS-CoV-2 clinical specimens and 86 NYC subway samples, providing a broad molecular portrait of the COVID-19 NYC outbreak. Our results define new features of SARS-CoV-2 evolution, nominate a novel, NYC-enriched viral subclade, reveal specific host responses in interferon, ACE, hematological, and olfaction pathways, and examine risks associated with use of ACE inhibitors and angiotensin receptor blockers. Together, these findings have immediate applications to SARS-CoV-2 diagnostics, public health, and new therapeutic targets.
Collapse
Affiliation(s)
- Daniel J. Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | | | - Cem Meydan
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, NY, USA
| | - David Danko
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- Tri-Institutional Computational Biol. & Medicine Program, Weill Cornell Medicine, NY, USA
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
| | - Joel Rosiene
- New York Genome Center, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Alon Shaiber
- New York Genome Center, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, NY, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, NY, USA
| | - Matthew MacKay
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nikolay A. Ivanov
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- Clinical & Translational Science Center, Weill Cornell Medicine, NY, USA
| | - Maria Sierra
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Diana Pohle
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - Michael Zietz
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
| | - Undina Gisladottir
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
| | - Vijendra Ramlall
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
- Department of Cellular, Molecular Physiology & Biophysics, Columbia University, NY, USA
| | - Craig D. Westover
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Krista Ryon
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Benjamin Young
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | | | - Phyllis Ruggiero
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | | | | | - Justyna Gawrys
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Dmitry Meleshko
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- Tri-Institutional Computational Biol. & Medicine Program, Weill Cornell Medicine, NY, USA
| | - Dong Xu
- Genomics Resources Core Facility, Weill Cornell Medicine, NY, USA
| | | | - Amos J. Shemesh
- Department of Emergency Medicine, Weill Cornell Medicine, NY, USA
| | - Jenny Xiang
- Genomics Resources Core Facility, Weill Cornell Medicine, NY, USA
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, NY, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, MD, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, MD, USA
| | | | - Angelika Iftner
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - Daniela Bezdan
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - John Sipley
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Lin Cong
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Arryn Craney
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Priya Velu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | | | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, NY, USA
| | - Stacy M. Horner
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, NC, USA
- Department of Medicine, Duke University Medical Center, NC, USA
| | - Thomas Iftner
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - Mirella Salvatore
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, NY, USA
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Lars F. Westblade
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, NY, USA
| | - Melissa Cushing
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Shawn Levy
- HudsonAlpha Discovery Institute, Huntsville, AL, USA
| | - Shixiu Wu
- Hangzhou Cancer Institute, Hangzhou Cancer Hospital, Hangzhou, China
- Department of Radiation Oncology, Hangzhou Cancer Hospital, Hangzhou, China
| | - Nicholas Tatonetti
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
| | - Marcin Imielinski
- New York Genome Center, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, NY, USA
| | - Hanna Rennert
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, NY, USA
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, NY, USA
| |
Collapse
|
16
|
Meleshko D, Mohimani H, Tracanna V, Hajirasouliha I, Medema MH, Korobeynikov A, Pevzner PA. BiosyntheticSPAdes: reconstructing biosynthetic gene clusters from assembly graphs. Genome Res 2019; 29:1352-1362. [PMID: 31160374 PMCID: PMC6673720 DOI: 10.1101/gr.243477.118] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 05/29/2019] [Indexed: 12/31/2022]
Abstract
Predicting biosynthetic gene clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGCs in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them. The situation is even more severe in shotgun metagenomics, where the contigs are often short, and the existing tools fail to predict a large fraction of long BGCs. While it is difficult to assemble BGCs in a single contig, the structure of the genome assembly graph often provides clues on how to combine multiple contigs into segments encoding long BGCs. We describe biosyntheticSPAdes, a tool for predicting BGCs in assembly graphs and demonstrate that it greatly improves the reconstruction of BGCs from genomic and metagenomics data sets.
Collapse
Affiliation(s)
- Dmitry Meleshko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia, 19904.,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, New York 10021, USA
| | - Hosein Mohimani
- Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, USA.,Computational Biology Department, School of Computer Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Vittorio Tracanna
- Bioinformatics Group, Wageningen University, 6708 PB Wageningen, The Netherlands
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, New York 10021, USA.,Englander Institute for Precision Medicine, Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10021, USA
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, 6708 PB Wageningen, The Netherlands
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia, 19904.,Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia, 198504
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia, 19904.,Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, USA
| |
Collapse
|
17
|
Danko DC, Meleshko D, Bezdan D, Mason C, Hajirasouliha I. Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics. Genome Res 2019; 29:116-124. [PMID: 30523036 PMCID: PMC6314158 DOI: 10.1101/gr.235499.118] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 11/26/2018] [Indexed: 12/20/2022]
Abstract
Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have revived interest in short-read technology as a viable approach to understand large-scale structures in genomes and metagenomes. Linked-Read technologies, such as the 10x Chromium system, use a microfluidic system and a specialized set of 3' barcodes (aka UIDs) to tag short DNA reads sourced from the same long fragment of DNA; subsequently, the tagged reads are sequenced on standard short-read platforms. This approach results in interesting compromises. Each long fragment of DNA is only sparsely covered by reads, no information about the ordering of reads from the same fragment is preserved, and 3' barcodes match reads from roughly 2-20 long fragments of DNA. However, compared to long-read technologies, the cost per base to sequence is far lower, far less input DNA is required, and the per base error rate is that of Illumina short-reads. In this paper, we formally describe a particular algorithmic issue common to Linked-Read technology: the deconvolution of reads with a single 3' barcode into clusters that represent single long fragments of DNA. We introduce Minerva, a graph-based algorithm that approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we develop two demonstrations where the deconvolution of barcoded reads improves downstream results, improving the specificity of taxonomic assignments and of k-mer-based clustering. To the best of our knowledge, we are the first to address the problem of barcode deconvolution in metagenomics.
Collapse
Affiliation(s)
- David C Danko
- Tri-Institutional Computational Biology and Medicine Program, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
| | - Dmitry Meleshko
- Tri-Institutional Computational Biology and Medicine Program, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
| | - Daniela Bezdan
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
| | - Christopher Mason
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, New York 10065, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| |
Collapse
|
18
|
Abstract
While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.
Collapse
Affiliation(s)
- Sergey Nurk
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004.,Department of Statistical Modelling, St. Petersburg State University, St. Petersburg, Russia 198515
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia 199004.,Department of Computer Science and Engineering, University of California, San Diego, California 92093-0404, USA
| |
Collapse
|