1
|
Avalos-Pacheco A, Cronjäger MC, Jenkins PA, Hein J. An almost infinite sites model. Theor Popul Biol 2024; 160:49-61. [PMID: 39454763 DOI: 10.1016/j.tpb.2024.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 09/11/2024] [Accepted: 10/11/2024] [Indexed: 10/28/2024]
Abstract
MOTIVATION A main challenge in molecular evolution is to find computationally efficient mutation models with flexible assumptions that properly reflect genetic variation. The infinite sites model assumes that each mutation event occurs at a site never previously mutant, i.e. it does not allow recurrent mutations. This is reasonable for low mutation rates and makes statistical inference much more tractable. However, recurrent mutations are common enough to be observable from genetic variation data, even in species with low per-site mutation rates such as humans. The finite sites model on the other hand allows for recurrent mutations but is computationally unfeasible to work with in most cases. In this work, we bridge these two approaches by developing a novel molecular evolution model, the almost infinite sites model, that both admits recurrent mutations and is tractable. We provide a recursive characterization of the likelihood of our proposed model under complete linkage and outline a parsimonious approximation scheme for computing it. RESULTS We show the usefulness of our model in simulated and human mitochondrial data. Our results show that the AISM, in combination with a constraint on the total number of mutation events, can recover accurate approximations to the maximum likelihood estimator of the mutation rate. AVAILABILITY AND IMPLEMENTATION An implementation of our model is freely available along with code for reproducing our computational experiments at https://github.com/Cronjaeger/almost-infinite-sites-recursions.
Collapse
Affiliation(s)
- Alejandra Avalos-Pacheco
- Institute of Applied Statistics, Johannes Kepler University Linz, 4040 Linz, Austria; Harvard-MIT Center for Regulatory Science, Harvard University, 210 Longwood Ave, Boston, MA 02155, United States of America
| | - Mathias C Cronjäger
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford OX1 3LB, United Kingdom; Novo Nordisk, 2880 Bagsværd, Denmark
| | - Paul A Jenkins
- Department of Statistics, University of Warwick, Coventry, CV4 7AL, United Kingdom; Department of Computer Science, University of Warwick, Coventry, CV4 7AL, United Kingdom; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, United Kingdom
| | - Jotun Hein
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford OX1 3LB, United Kingdom.
| |
Collapse
|
2
|
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. Genetics 2024; 228:iyae100. [PMID: 39013109 PMCID: PMC11373519 DOI: 10.1093/genetics/iyae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 06/05/2024] [Indexed: 07/18/2024] Open
Abstract
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. However, this approach is out of step with some modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalizes these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.
Collapse
Affiliation(s)
- Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Anastasia Ignatieva
- School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8TA, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Jere Koskela
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle NE1 7RU, UK
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Anthony W Wohns
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5101, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| |
Collapse
|
3
|
Coomber A, Saville A, Ristaino JB. Evolution of Phytophthora infestans on its potato host since the Irish potato famine. Nat Commun 2024; 15:6488. [PMID: 39103347 DOI: 10.1038/s41467-024-50749-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 07/18/2024] [Indexed: 08/07/2024] Open
Abstract
Phytophthora infestans is a major oomycete plant pathogen, responsible for potato late blight, which led to the Irish Potato Famine from 1845-1852. Since then, potatoes resistant to this disease have been bred and deployed worldwide. Their resistance (R) genes recognize pathogen effectors responsible for virulence and then induce a plant response stopping disease progression. However, most deployed R genes are quickly overcome by the pathogen. We use targeted sequencing of effector and R genes on herbarium specimens to examine the joint evolution in both P. infestans and potato from 1845-1954. Currently relevant effectors are historically present in P. infestans, but with alternative alleles compared to modern reference genomes. The historic FAM-1 lineage has the virulent Avr1 allele and the ability to break the R1 resistance gene before breeders deployed it in potato. The FAM-1 lineage is diploid, but later, triploid US-1 lineages appear. We show that pathogen virulence genes and host resistance genes have undergone significant changes since the Famine, from both natural and artificial selection.
Collapse
Affiliation(s)
- Allison Coomber
- Department of Entomology and Plant Pathology, NC State University, Raleigh, NC, USA
- Functional Genomics Program, NC State University, Raleigh, NC, USA
| | - Amanda Saville
- Department of Entomology and Plant Pathology, NC State University, Raleigh, NC, USA
| | - Jean Beagle Ristaino
- Department of Entomology and Plant Pathology, NC State University, Raleigh, NC, USA.
- Emerging Plant Disease and Global Food Security Cluster, NC State University, Raleigh, NC, USA.
| |
Collapse
|
4
|
Goli RC, Chishi KG, Ganguly I, Singh S, Dixit S, Rathi P, Diwakar V, Sree C C, Limbalkar OM, Sukhija N, Kanaka K. Global and Local Ancestry and its Importance: A Review. Curr Genomics 2024; 25:237-260. [PMID: 39156729 PMCID: PMC11327809 DOI: 10.2174/0113892029298909240426094055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/02/2024] [Accepted: 03/11/2024] [Indexed: 08/20/2024] Open
Abstract
The fastest way to significantly change the composition of a population is through admixture, an evolutionary mechanism. In animal breeding history, genetic admixture has provided both short-term and long-term advantages by utilizing the phenomenon of complementarity and heterosis in several traits and genetic diversity, respectively. The traditional method of admixture analysis by pedigree records has now been replaced greatly by genome-wide marker data that enables more precise estimations. Among these markers, SNPs have been the popular choice since they are cost-effective, not so laborious, and automation of genotyping is easy. Certain markers can suggest the possibility of a population's origin from a sample of DNA where the source individual is unknown or unwilling to disclose their lineage, which are called Ancestry-Informative Markers (AIMs). Revealing admixture level at the locus-specific level is termed as local ancestry and can be exploited to identify signs of recent selective response and can account for genetic drift. Considering the importance of genetic admixture and local ancestry, in this mini-review, both concepts are illustrated, encompassing basics, their estimation/identification methods, tools/software used and their applications.
Collapse
Affiliation(s)
| | - Kiyevi G. Chishi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Indrajit Ganguly
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Sanjeev Singh
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - S.P. Dixit
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Pallavi Rathi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Vikas Diwakar
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Chandana Sree C
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | | | - Nidhi Sukhija
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
- Central Tasar Research and Training Institute, Ranchi, 835303, Jharkhand, India
| | - K.K Kanaka
- ICAR- Indian Institute of Agricultural Biotechnology, Ranchi, 834010, Jharkhand, India
| |
Collapse
|
5
|
Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.03.565466. [PMID: 37961279 PMCID: PMC10635123 DOI: 10.1101/2023.11.03.565466] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. This approach is out of step with modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalises these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.
Collapse
Affiliation(s)
- Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | - Anastasia Ignatieva
- School of Mathematics and Statistics, University of Glasgow, UK
- Department of Statistics, University of Oxford, UK
| | - Jere Koskela
- School of Mathematics, Statistics and Physics, Newcastle University, UK
- Department of Statistics, University of Warwick, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, UK
| | - Anthony W. Wohns
- Broad Institute of MIT and Harvard, Cambridge, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| |
Collapse
|
6
|
Brandt DYC, Huber CD, Chiang CWK, Ortega-Del Vecchyo D. The Promise of Inferring the Past Using the Ancestral Recombination Graph. Genome Biol Evol 2024; 16:evae005. [PMID: 38242694 PMCID: PMC10834162 DOI: 10.1093/gbe/evae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/11/2023] [Accepted: 12/17/2023] [Indexed: 01/21/2024] Open
Abstract
The ancestral recombination graph (ARG) is a structure that represents the history of coalescent and recombination events connecting a set of sequences (Hudson RR. In: Futuyma D, Antonovics J, editors. Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology; 1991. p. 1 to 44.). The full ARG can be represented as a set of genealogical trees at every locus in the genome, annotated with recombination events that change the topology of the trees between adjacent loci and the mutations that occurred along the branches of those trees (Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavare S, editors. Progress in population genetics and human evolution. Springer; 1997. p. 257 to 270.). Valuable insights can be gained into past evolutionary processes, such as demographic events or the influence of natural selection, by studying the ARG. It is regarded as the "holy grail" of population genetics (Hubisz M, Siepel A. Inference of ancestral recombination graphs using ARGweaver. In: Dutheil JY, editors. Statistical population genomics. New York, NY: Springer US; 2020. p. 231-266.) since it encodes the processes that generate all patterns of allelic and haplotypic variation from which all commonly used summary statistics in population genetic research (e.g. heterozygosity and linkage disequilibrium) can be derived. Many previous evolutionary inferences relied on summary statistics extracted from the genotype matrix. Evolutionary inferences using the ARG represent a significant advancement as the ARG is a representation of the evolutionary history of a sample that shows the past history of recombination, coalescence, and mutation events across a particular sequence. This representation in theory contains as much information, if not more, than the combination of all independent summary statistics that could be derived from the genotype matrix. Consistent with this idea, some of the first ARG-based analyses have proven to be more powerful than summary statistic-based analyses (Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019:51(9):1321 to 1329.; Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet. 2019:15(9):e1008384.; Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet. 2020:16(8):e1008895.; Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet. 2022:109(5):812-824.; Fan C, Cahoon JL, Dinh BL, Ortega-Del Vecchyo D, Huber C, Edge MD, Mancuso N, Chiang CWK. A likelihood-based framework for demographic inference from genealogical trees. bioRxiv. 2023.10.10.561787. 2023.; Hejase HA, Mo Z, Campagna L, Siepel A. A deep-learning approach for inference of selective sweeps from the ancestral recombination graph. Mol Biol Evol. 2022:39(1):msab332.; Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. bioRxiv. 2023.04.07.536093. 2023.; Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet. 2023:55(5):768-776.). As such, there has been significant interest in the field to investigate 2 main problems related to the ARG: (i) How can we estimate the ARG based on genomic data, and (ii) how can we extract information of past evolutionary processes from the ARG? In this perspective, we highlight 3 topics that pertain to these main issues: The development of computational innovations that enable the estimation of the ARG; remaining challenges in estimating the ARG; and methodological advances for deducing evolutionary forces and mechanisms using the ARG. This perspective serves to introduce the readers to the types of questions that can be explored using the ARG and to highlight some of the most pressing issues that must be addressed in order to make ARG-based inference an indispensable tool for evolutionary research.
Collapse
Affiliation(s)
- Débora Y C Brandt
- Department of Genetics Evolution and Environment, University College London, London, UK
| | - Christian D Huber
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma De México, Querétaro, Querétaro, Mexico
| |
Collapse
|
7
|
Cousins T, Tabin D, Patterson N, Reich D, Durvasula A. Accurate inference of population history in the presence of background selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576291. [PMID: 38313273 PMCID: PMC10838404 DOI: 10.1101/2024.01.18.576291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
All published methods for learning about demographic history make the simplifying assumption that the genome evolves neutrally, and do not seek to account for the effects of natural selection on patterns of variation. This is a major concern, as ample work has demonstrated the pervasive effects of natural selection and in particular background selection (BGS) on patterns of genetic variation in diverse species. Simulations and theoretical work have shown that methods to infer changes in effective population size over time (Ne(t)) become increasingly inaccurate as the strength of linked selection increases. Here, we introduce an extension to the Pairwise Sequentially Markovian Coalescent (PSMC) algorithm, PSMC+, which explicitly co-models demographic history and natural selection. We benchmark our method using forward-in-time simulations with BGS and find that our approach improves the accuracy of effective population size inference. Leveraging a high resolution map of BGS in humans, we infer considerable changes in the magnitude of inferred effective population size relative to previous reports. Finally, we separately infer Ne(t) on the X chromosome and on the autosomes in diverse great apes without making a correction for selection, and find that the inferred ratio fluctuates substantially through time in a way that differs across species, showing that uncorrected selection may be an important driver of signals of genetic difference on the X chromosome and autosomes.
Collapse
Affiliation(s)
- Trevor Cousins
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Daniel Tabin
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Howard Hughes Medical Institute, Boston, MA, USA
| | - Arun Durvasula
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
8
|
Nait Saada J, Tsangalidou Z, Stricker M, Palamara PF. Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks. Mol Biol Evol 2023; 40:msad211. [PMID: 37738175 PMCID: PMC10581698 DOI: 10.1093/molbev/msad211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023] Open
Abstract
Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.
Collapse
Affiliation(s)
| | | | | | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
9
|
Laetsch DR, Bisschop G, Martin SH, Aeschbacher S, Setter D, Lohse K. Demographically explicit scans for barriers to gene flow using gIMble. PLoS Genet 2023; 19:e1010999. [PMID: 37816069 PMCID: PMC10610087 DOI: 10.1371/journal.pgen.1010999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/27/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Progress has been impeded by a conceptual and methodological divide between analyses that infer the demographic history of speciation and genome scans aimed at identifying locally maladaptive alleles i.e. genomic barriers to gene flow. Here we implement genomewide IM blockwise likelihood estimation (gIMble), a composite likelihood approach for the quantification of barriers, that bridges this divide. This analytic framework captures background selection and selection against barriers in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. gIMble includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied pair of sister species of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analyses uncover both large-effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of a polygenic barrier architecture.
Collapse
Affiliation(s)
- Dominik R. Laetsch
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Gertjan Bisschop
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon H. Martin
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Simon Aeschbacher
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Derek Setter
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Konrad Lohse
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
10
|
Terbot JW, Johri P, Liphardt SW, Soni V, Pfeifer SP, Cooper BS, Good JM, Jensen JD. Developing an appropriate evolutionary baseline model for the study of SARS-CoV-2 patient samples. PLoS Pathog 2023; 19:e1011265. [PMID: 37018331 PMCID: PMC10075409 DOI: 10.1371/journal.ppat.1011265] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2023] Open
Abstract
Over the past 3 years, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has spread through human populations in several waves, resulting in a global health crisis. In response, genomic surveillance efforts have proliferated in the hopes of tracking and anticipating the evolution of this virus, resulting in millions of patient isolates now being available in public databases. Yet, while there is a tremendous focus on identifying newly emerging adaptive viral variants, this quantification is far from trivial. Specifically, multiple co-occurring and interacting evolutionary processes are constantly in operation and must be jointly considered and modeled in order to perform accurate inference. We here outline critical individual components of such an evolutionary baseline model-mutation rates, recombination rates, the distribution of fitness effects, infection dynamics, and compartmentalization-and describe the current state of knowledge pertaining to the related parameters of each in SARS-CoV-2. We close with a series of recommendations for future clinical sampling, model construction, and statistical analysis.
Collapse
Affiliation(s)
- John W Terbot
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Parul Johri
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Schuyler W Liphardt
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Vivak Soni
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Susanne P Pfeifer
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| | - Brandon S Cooper
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey M Good
- University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America
| | - Jeffrey D Jensen
- Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America
| |
Collapse
|
11
|
Zhou ZJ, Yang CH, Ye SB, Yu XW, Qiu Y, Ge XY. VirusRecom: an information-theory-based method for recombination detection of viral lineages and its application on SARS-CoV-2. Brief Bioinform 2023; 24:6886420. [PMID: 36567622 DOI: 10.1093/bib/bbac513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/08/2022] [Accepted: 10/27/2022] [Indexed: 12/27/2022] Open
Abstract
Genomic recombination is an important driving force for viral evolution, and recombination events have been reported for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the Coronavirus Disease 2019 pandemic, which significantly alter viral infectivity and transmissibility. However, it is difficult to identify viral recombination, especially for low-divergence viruses such as SARS-CoV-2, since it is hard to distinguish recombination from in situ mutation. Herein, we applied information theory to viral recombination analysis and developed VirusRecom, a program for efficiently screening recombination events on viral genome. In principle, we considered a recombination event as a transmission process of ``information'' and introduced weighted information content (WIC) to quantify the contribution of recombination to a certain region on viral genome; then, we identified the recombination regions by comparing WICs of different regions. In the benchmark using simulated data, VirusRecom showed a good balance between precision and recall compared to two competing tools, RDP5 and 3SEQ. In the detection of SARS-CoV-2 XE, XD and XF recombinants, VirusRecom providing more accurate positions of recombination regions than RDP5 and 3SEQ. In addition, we encapsulated the VirusRecom program into a command-line-interface software for convenient operation by users. In summary, we developed a novel approach based on information theory to identify viral recombination within highly similar sequences, providing a useful tool for monitoring viral evolution and epidemic control.
Collapse
Affiliation(s)
- Zhi-Jian Zhou
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Chen-Hui Yang
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Sheng-Bao Ye
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Xiao-Wei Yu
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China.,Hunan Prevention and Treatment Institute for Occupational Diseases, 162 Xinjian W. Rd., Changsha, Hunan, 410000, China
| | - Ye Qiu
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| | - Xing-Yi Ge
- Hunan Provincial Key Laboratory of Medical Virology, Institute of Pathogen Biology and Immunology, College of Biology, Hunan University, 27 Tianma Rd., Changsha, Hunan, 410012, China
| |
Collapse
|
12
|
Focosi D, Maggi F. Recombination in Coronaviruses, with a Focus on SARS-CoV-2. Viruses 2022; 14:1239. [PMID: 35746710 PMCID: PMC9228924 DOI: 10.3390/v14061239] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/06/2022] [Accepted: 06/06/2022] [Indexed: 02/07/2023] Open
Abstract
Recombination is a common evolutionary tool for RNA viruses, and coronaviruses are no exception. We review here the evidence for recombination in SARS-CoV-2 and reconcile nomenclature for recombinants, discuss their origin and fitness, and speculate how recombinants could make a difference in the future of the COVID-19 pandemics.
Collapse
Affiliation(s)
- Daniele Focosi
- North-Western Tuscany Blood Bank, Pisa University Hospital, 56124 Pisa, Italy
| | - Fabrizio Maggi
- Department of Medicine and Surgery, University of Insubria, 21100 Varese, Italy
| |
Collapse
|
13
|
Ignatieva A, Hein J, Jenkins PA. Ongoing Recombination in SARS-CoV-2 Revealed through Genealogical Reconstruction. Mol Biol Evol 2022; 39:msac028. [PMID: 35106601 PMCID: PMC8841603 DOI: 10.1093/molbev/msac028] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The evolutionary process of genetic recombination has the potential to rapidly change the properties of a viral pathogen, and its presence is a crucial factor to consider in the development of treatments and vaccines. It can also significantly affect the results of phylogenetic analyses and the inference of evolutionary rates. The detection of recombination from samples of sequencing data is a very challenging problem and is further complicated for SARS-CoV-2 by its relatively slow accumulation of genetic diversity. The extent to which recombination is ongoing for SARS-CoV-2 is not yet resolved. To address this, we use a parsimony-based method to reconstruct possible genealogical histories for samples of SARS-CoV-2 sequences, which enables us to pinpoint specific recombination events that could have generated the data. We propose a statistical framework for disentangling the effects of recurrent mutation from recombination in the history of a sample, and hence provide a way of estimating the probability that ongoing recombination is present. We apply this to samples of sequencing data collected in England and South Africa and find evidence of ongoing recombination.
Collapse
Affiliation(s)
| | - Jotun Hein
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- The Alan Turing Institute, British Library, London, United Kingdom
| | - Paul A Jenkins
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- Department of Computer Science, University of Warwick, Coventry, United Kingdom
| |
Collapse
|