1
|
Tan ZH, Yong KY, Shu JJ. Predicting potential SARS-CoV-2 spillover and spillback in animals. JOURNAL OF MICROBIOLOGY, IMMUNOLOGY, AND INFECTION = WEI MIAN YU GAN RAN ZA ZHI 2024; 57:225-237. [PMID: 38262772 DOI: 10.1016/j.jmii.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 12/08/2023] [Accepted: 01/02/2024] [Indexed: 01/25/2024]
Abstract
BACKGROUND The COVID-19 pandemic is spreading rapidly around the world, causing countries to impose lockdowns and efforts to develop vaccines on a global scale. However, human-to-animal and animal-to-human transmission cannot be ignored, as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can spread rapidly in farmed and wild animals. This could create a worrying cycle of SARS-CoV-2 spillover from humans to animals and spillback of new strains back into humans, rendering vaccines ineffective. METHOD This study provides a key indicator of animals that may be potential susceptible hosts for SARS-CoV-2 and coronavirus infections by analysing the phylogenetic distance between host angiotensin-converting enzyme 2 and the coronavirus spike protein. Crucially, our analysis identifies animals that are at elevated risk from a spillover and spillback incident. RESULTS One group of animals has been identified as potentially susceptible to SARS-CoV-2 by harbouring a parasitic coronavirus spike protein similar to the SARS-CoV-2 spike protein. These animals may serve as amplification hosts in spillover events from zoonotic reservoirs. This group consists of a mixture of animals infected internally and naturally: minks, dogs, cats, tigers. Additionally, no internal or natural infections have been found in masked palm civet. CONCLUSION Tracing interspecies transmission in multi-host environments based solely on in vitro and in vivo examinations of animal susceptibility or serology is a time-consuming task. This approach allows rapid identification of high-risk animals to prioritize research and assessment of the risk of zoonotic disease transmission in the environment. It is a tool to rapidly identify zoonotic species that may cause outbreaks or participate in expansion cycles of coexistence with their hosts. This prevents the spread of coronavirus infections between species, preventing spillover and spillback incidents from occurring.
Collapse
Affiliation(s)
- Zi Hian Tan
- School of Mechanical & Aerospace Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Kian Yan Yong
- School of Mechanical & Aerospace Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Jian-Jun Shu
- School of Mechanical & Aerospace Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore.
| |
Collapse
|
2
|
Olm MR, Spencer SP, Silva EL, Sonnenburg JL. Metagenomic Immunoglobulin Sequencing (MIG-Seq) Exposes Patterns of IgA Antibody Binding in the Healthy Human Gut Microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568153. [PMID: 38045399 PMCID: PMC10690254 DOI: 10.1101/2023.11.21.568153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
IgA, the most highly produced human antibody, is continually secreted into the gut to shape the intestinal microbiota. Methodological limitations have critically hindered defining which microbial strains are targeted by IgA and why. Here, we develop a new technique, Metagenomic Immunoglobulin Sequencing (MIG-Seq), and use it to determine IgA coating levels for thousands of gut microbiome strains in healthy humans. We find that microbes associated with both health and disease have higher levels of coating, and that microbial genes are highly predictive of IgA binding levels, with mucus degradation genes especially correlated with high binding. We find a significant reduction in replication rates among microbes bound by IgA, and demonstrate that IgA binding is more correlated with host immune status than traditional microbial abundance measures. This study introduces a powerful technique for assessing strain-level IgA binding in human stool, paving the way for deeper understanding of IgA-based host microbe interactions.
Collapse
Affiliation(s)
- Matthew R. Olm
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
| | - Sean P. Spencer
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
- Division of Gastroenterology and Hepatology, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Evelyn Lemus Silva
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
| | - Justin L. Sonnenburg
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Center for Human Microbiome Studies, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
3
|
Liu B, Warnow T. Weighted ASTRID: fast and accurate species trees from weighted internode distances. Algorithms Mol Biol 2023; 18:6. [PMID: 37468904 PMCID: PMC10355063 DOI: 10.1186/s13015-023-00230-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/10/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. RESULTS Our experimental study evaluating weighted ASTRID typically shows improvements in accuracy compared to the original (unweighted) ASTRID, and shows competitive accuracy against weighted ASTRAL, the state of the art. Our re-implementation of ASTRID also improves the runtime, with marked improvements on large datasets. CONCLUSIONS Weighted ASTRID is a new and very fast method for species tree estimation that typically improves upon ASTRID and has comparable accuracy to weighted ASTRAL, while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode .
Collapse
Affiliation(s)
- Baqiao Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL USA
| |
Collapse
|
4
|
Sanderson T. Taxonium, a web-based tool for exploring large phylogenetic trees. eLife 2022; 11:e82392. [PMID: 36377483 PMCID: PMC9704803 DOI: 10.7554/elife.82392] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 10/27/2022] [Indexed: 11/16/2022] Open
Abstract
The COVID-19 pandemic has resulted in a step change in the scale of sequencing data, with more genomes of SARS-CoV-2 having been sequenced than any other organism on earth. These sequences reveal key insights when represented as a phylogenetic tree, which captures the evolutionary history of the virus, and allows the identification of transmission events and the emergence of new variants. However, existing web-based tools for exploring phylogenies do not scale to the size of datasets now available for SARS-CoV-2. We have developed Taxonium, a new tool that uses WebGL to allow the exploration of trees with tens of millions of nodes in the browser for the first time. Taxonium links each node to associated metadata and supports mutation-annotated trees, which are able to capture all known genetic variation in a dataset. It can either be run entirely locally in the browser, from a server-based backend, or as a desktop application. We describe insights that analysing a tree of five million sequences can provide into SARS-CoV-2 evolution, and provide a tool at cov2tree.org for exploring a public tree of more than five million SARS-CoV-2 sequences. Taxonium can be applied to any tree, and is available at taxonium.org, with source code at github.com/theosanderson/taxonium.
Collapse
|
5
|
Pekar JE, Magee A, Parker E, Moshiri N, Izhikevich K, Havens JL, Gangavarapu K, Malpica Serrano LM, Crits-Christoph A, Matteson NL, Zeller M, Levy JI, Wang JC, Hughes S, Lee J, Park H, Park MS, Ching KZY, Lin RTP, Mat Isa MN, Noor YM, Vasylyeva TI, Garry RF, Holmes EC, Rambaut A, Suchard MA, Andersen KG, Worobey M, Wertheim JO. The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science 2022; 377:960-966. [PMID: 35881005 PMCID: PMC9348752 DOI: 10.1126/science.abp8337] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 07/18/2022] [Indexed: 01/08/2023]
Abstract
Understanding the circumstances that lead to pandemics is important for their prevention. We analyzed the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted "A" and "B." Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October to 8 December), and the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans before November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events.
Collapse
Affiliation(s)
- Jonathan E. Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrew Magee
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Edyth Parker
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Izhikevich
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093, USA
| | - Jennifer L. Havens
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Karthik Gangavarapu
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | | - Alexander Crits-Christoph
- W. Harry Feinstone Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Nathaniel L. Matteson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Joshua I. Levy
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Jade C. Wang
- New York City Public Health Laboratory, New York City Department of Health and Mental Hygiene, New York, NY 11101, USA
| | - Scott Hughes
- New York City Public Health Laboratory, New York City Department of Health and Mental Hygiene, New York, NY 11101, USA
| | - Jungmin Lee
- Department of Microbiology, Institute for Viral Diseases, Biosafety Center, College of Medicine, Korea University, Seoul, South Korea
| | - Heedo Park
- Department of Microbiology, Institute for Viral Diseases, Biosafety Center, College of Medicine, Korea University, Seoul, South Korea
- BK21 Graduate Program, Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Man-Seong Park
- Department of Microbiology, Institute for Viral Diseases, Biosafety Center, College of Medicine, Korea University, Seoul, South Korea
- BK21 Graduate Program, Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | | | - Raymond Tzer Pin Lin
- National Public Health Laboratory, National Centre for Infectious Diseases, Singapore
| | - Mohd Noor Mat Isa
- Malaysia Genome and Vaccine Institute, Jalan Bangi, 43000 Kajang, Selangor, Malaysia
| | - Yusuf Muhammad Noor
- Malaysia Genome and Vaccine Institute, Jalan Bangi, 43000 Kajang, Selangor, Malaysia
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Robert F. Garry
- Tulane University, School of Medicine, Department of Microbiology and Immunology, New Orleans, LA 70112, USA
- Zalgen Labs, LCC, Frederick, MD 21703 USA
- Global Virus Network (GVN), Baltimore, MD 21201, USA
| | - Edward C. Holmes
- Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh, EH9 3FL, UK
| | - Marc A. Suchard
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Kristian G. Andersen
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Scripps Research Translational Institute, La Jolla, CA 92037, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
6
|
Xu T, Kong L, Li Q. Testing Efficacy of Assembly-Free and Alignment-Free Methods for Species Identification Using Genome Skims, with Patellogastropoda as a Test Case. Genes (Basel) 2022; 13:genes13071192. [PMID: 35885975 PMCID: PMC9318368 DOI: 10.3390/genes13071192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/26/2022] [Accepted: 06/28/2022] [Indexed: 02/05/2023] Open
Abstract
Most recently, species identification has leaped from DNA barcoding into shotgun sequencing-based “genome skimming” alternatives. Genome skims have mainly been used to assemble organelle genomes, which discards much of the nuclear genome. Recently, an alternative approach was proposed for sample identification, using unassembled genome skims, which can effectively improve phylogenetic signal and identification resolution. Studies have shown that the software Skmer and APPLES work well at estimating genomic distance and performing phylogenetic placement in birds and insects using low-coverage genome skims. In this study, we use Skmer and APPLES based on genome skims of 11 patellogastropods to perform assembly-free and alignment-free species identification and phylogenetic placement. Whether or not data corresponding to query species are present in the reference database, Skmer selects the best matching or closest species with COI barcodes under different sizes of genome skims except lacking species belonging to the same family as a query. APPLES cannot place patellogastropods in the correct phylogenetic position when the reference database is sparse. Our study represents the first attempt at assembly-free and alignment-free species identification of marine mollusks using genome skims, demonstrating its feasibility for patellogastropod species identification and flanking the necessity of establishing a database to share genome skims.
Collapse
Affiliation(s)
- Tao Xu
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, 5 Yushan Road, Qingdao 266003, China; (T.X.); (Q.L.)
| | - Lingfeng Kong
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, 5 Yushan Road, Qingdao 266003, China; (T.X.); (Q.L.)
- Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, 5 Yushan Road, Qingdao 266003, China
- Correspondence:
| | - Qi Li
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, 5 Yushan Road, Qingdao 266003, China; (T.X.); (Q.L.)
- Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, 5 Yushan Road, Qingdao 266003, China
| |
Collapse
|
7
|
Jiang Y, Balaban M, Zhu Q, Mirarab S. DEPP: Deep Learning Enables Extending Species Trees using Single Genes. Syst Biol 2022; 72:17-34. [PMID: 35485976 DOI: 10.1093/sysbio/syac031] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/13/2022] [Accepted: 04/22/2022] [Indexed: 11/13/2022] Open
Abstract
Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. Existing placement methods assume that query sequences have evolved under specific models directly on the reference phylogeny. For example, they assume single-gene data (e.g., 16S rRNA amplicons) have evolved under the GTR model on a gene tree. Placement, however, often has a more ambitious goal: extending a (genome-wide) species tree given data from individual genes without knowing the evolutionary model. Addressing this challenging problem requires new directions. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without pre-specified models. In simulations and on real data, we show that DEPP can match the accuracy of model-based methods without any prior knowledge of the model. We also show that DEPP can update the multi-locus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can combine 16S and metagenomic data onto a single tree, enabling community structure analyses that take advantage of both sources of data.
Collapse
Affiliation(s)
- Yueyu Jiang
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| | - Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, CA 92093, USA
| | - Qiyun Zhu
- Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ 85281, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| |
Collapse
|
8
|
Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022; 71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open
Abstract
Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].
Collapse
Affiliation(s)
- James Willson
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mrinmoy Saha Roddur
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Baqiao Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Paul Zaharias
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
9
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
10
|
Molloy EK, Gatesy J, Springer MS. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Syst Biol 2021; 71:721-740. [PMID: 34677617 DOI: 10.1093/sysbio/syab086] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e. local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study, provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100 000 parsimony-informative RIs. We found that, when given just 1 000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e clades separated by branches > 0.3 CUs) with high support and identified rapid radiations (i.e. shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, College Park, 20742, USA
| | - John Gatesy
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, 92521, USA
| |
Collapse
|
11
|
Pekar J, Worobey M, Moshiri N, Scheffler K, Wertheim JO. Timing the SARS-CoV-2 index case in Hubei province. Science 2021; 372:412-417. [PMID: 33737402 PMCID: PMC8139421 DOI: 10.1126/science.abf8003] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 03/15/2021] [Indexed: 12/14/2022]
Abstract
Understanding when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged is critical to evaluating our current approach to monitoring novel zoonotic pathogens and understanding the failure of early containment and mitigation efforts for COVID-19. We used a coalescent framework to combine retrospective molecular clock inference with forward epidemiological simulations to determine how long SARS-CoV-2 could have circulated before the time of the most recent common ancestor of all sequenced SARS-CoV-2 genomes. Our results define the period between mid-October and mid-November 2019 as the plausible interval when the first case of SARS-CoV-2 emerged in Hubei province, China. By characterizing the likely dynamics of the virus before it was discovered, we show that more than two-thirds of SARS-CoV-2-like zoonotic events would be self-limited, dying out without igniting a pandemic. Our findings highlight the shortcomings of zoonosis surveillance approaches for detecting highly contagious pathogens with moderate mortality rates.
Collapse
Affiliation(s)
- Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Joel O Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
12
|
Moshiri N, Smith DM, Mirarab S. HIV Care Prioritization Using Phylogenetic Branch Length. J Acquir Immune Defic Syndr 2021; 86:626-637. [PMID: 33394616 PMCID: PMC7933099 DOI: 10.1097/qai.0000000000002612] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 12/14/2020] [Indexed: 12/22/2022]
Abstract
BACKGROUND The structure of the HIV transmission networks can be dictated by just a few individuals. Public health intervention, such as ensuring people living with HIV adhere to antiretroviral therapy and remain virally suppressed, can help control the spread of the virus. However, such intervention requires using limited public health resource allocations. Determining which individuals are most at risk of transmitting HIV could allow public health officials to focus their limited resources on these individuals. SETTING Molecular epidemiology can help prioritize people living with HIV by patterns of transmission inferred from their sampled viral sequences. Such prioritization has been previously suggested and performed by monitoring cluster growth. In this article, we introduce Prioritization using AnCesTral edge lengths (ProACT), a phylogenetic approach for prioritizing individuals living with HIV. METHODS ProACT starts from a phylogeny inferred from sequence data and orders individuals according to their terminal branch length, breaking ties using ancestral branch lengths. We evaluated ProACT on a real data set of 926 HIV-1 subtype B pol data obtained in San Diego between 2005 and 2014 and a simulation data set modeling the same epidemic. Prioritization methods are compared by their ability to predict individuals who transmit most after the prioritization. RESULTS Across all simulation conditions and most real data sampling conditions, ProACT outperformed monitoring cluster growth for multiple metrics of prioritization efficacy. CONCLUSION The simple strategy used by ProACT improves the effectiveness of prioritization compared with state-of-the-art methods that rely on monitoring the growth of transmission clusters defined based on genetic distance.
Collapse
Affiliation(s)
- Niema Moshiri
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, 92093, USA
| | - Davey M. Smith
- Department of Medicine, University of California, San Diego, La Jolla, 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, 92093, USA
| |
Collapse
|
13
|
Pekar J, Worobey M, Moshiri N, Scheffler K, Wertheim JO. Timing the SARS-CoV-2 Index Case in Hubei Province. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.11.20.392126. [PMID: 33269353 PMCID: PMC7709179 DOI: 10.1101/2020.11.20.392126] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Understanding when SARS-CoV-2 emerged is critical to evaluating our current approach to monitoring novel zoonotic pathogens and understanding the failure of early containment and mitigation efforts for COVID-19. We employed a coalescent framework to combine retrospective molecular clock inference with forward epidemiological simulations to determine how long SARS-CoV-2 could have circulated prior to the time of the most recent common ancestor. Our results define the period between mid-October and mid-November 2019 as the plausible interval when the first case of SARS-CoV-2 emerged in Hubei province. By characterizing the likely dynamics of the virus before it was discovered, we show that over two-thirds of SARS-CoV-2-like zoonotic events would be self-limited, dying out without igniting a pandemic. Our findings highlight the shortcomings of zoonosis surveillance approaches for detecting highly contagious pathogens with moderate mortality rates.
Collapse
Affiliation(s)
- Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Niema Moshiri
- Department Computer Science & Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|