Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Griffiths RC, Marjoram P. Ancestral inference from samples of DNA sequences with recombination. J Comput Biol 1996;3:479-502. [PMID: 9018600 DOI: 10.1089/cmb.1996.3.479] [Citation(s) in RCA: 251] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

For:	Griffiths RC, Marjoram P. Ancestral inference from samples of DNA sequences with recombination. J Comput Biol 1996;3:479-502. [PMID: 9018600 DOI: 10.1089/cmb.1996.3.479] [Citation(s) in RCA: 251] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Number

Cited by Other Article(s)

Allen B, McAvoy A. The coalescent in finite populations with arbitrary, fixed structure. Theor Popul Biol 2024;158:150-169. [PMID: 38880430 DOI: 10.1016/j.tpb.2024.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 06/03/2024] [Accepted: 06/12/2024] [Indexed: 06/18/2024]

Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. Genetics 2024:iyae100. [PMID: 39013109 DOI: 10.1093/genetics/iyae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 06/05/2024] [Indexed: 07/18/2024] Open

Peng D, Mulder OJ, Edge MD. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595829. [PMID: 38854009 PMCID: PMC11160635 DOI: 10.1101/2024.05.24.595829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]

Abstract

Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of six ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle/ASMC-clust , and SINGER , using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust used samples ten times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate , and ARG-Needle/ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.

Collapse

DeHaas D, Pan Z, Wei X. Genotype Representation Graphs: Enabling Efficient Analysis of Biobank-Scale Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.23.590800. [PMID: 38712040 PMCID: PMC11071416 DOI: 10.1101/2024.04.23.590800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]

Abstract

Computational analysis of a large number of genomes requires a data structure that can represent the dataset compactly while also enabling efficient operations on variants and samples. Current practice is to store large-scale genetic polymorphism data using tabular data structures and file formats, where rows and columns represent samples and genetic variants. However, encoding genetic data in such formats has become unsustainable. For example, the UK Biobank polymorphism data of 200,000 phased whole genomes has exceeded 350 terabytes (TB) in Variant Call Format (VCF), too large to fit into hard drives in uncompressed form. To mitigate the computational burden, we introduce the Genotype Representation Graph (GRG), an extremely compact data structure to losslessly present phased whole-genome polymorphisms. A GRG is a fully connected hierarchical graph that exploits variant-sharing across samples, leveraging on ideas inspired by Ancestral Recombination Graphs. Capturing variant-sharing in a graph format compresses biobank-scale data to the point where it can fit in a typical server's RAM (5-26GB per chromosome), and enables graph-traversal algorithms to trivially reuse computed values, both of which can significantly reduce computation time. We have developed a command-line tool and a library usable via both C++ and Python for constructing and processing GRG files which scales to a million whole genomes. It takes 160GB disk space to encode the information in 200,000 UK Biobank phased whole genomes as a GRG, more than 2000 times smaller than the size of VCF. Moreover, the size of GRG increases sublinearly with the number of samples stored, making it a sustainable solution to the increasing number of samples in large datasets. We show that summaries of genetic variants can be computed on GRG via graph traversal that runs 230 times faster than on VCF. We anticipate that GRG-based algorithms will improve the scalability of various types of computation and generally lower the cost of analyzing large genomic datasets.

Collapse

Wong Y, Ignatieva A, Koskela J, Gorjanc G, Wohns AW, Kelleher J. A general and efficient representation of ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.03.565466. [PMID: 37961279 PMCID: PMC10635123 DOI: 10.1101/2023.11.03.565466] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024;25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]

Lewanski AL, Grundler MC, Bradburd GS. The era of the ARG: An introduction to ancestral recombination graphs and their significance in empirical evolutionary genomics. PLoS Genet 2024;20:e1011110. [PMID: 38236805 PMCID: PMC10796009 DOI: 10.1371/journal.pgen.1011110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open

Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023;110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open

Fan C, Cahoon JL, Dinh BL, Ortega-Del Vecchyo D, Huber C, Edge MD, Mancuso N, Chiang CWK. A likelihood-based framework for demographic inference from genealogical trees. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.10.561787. [PMID: 37873208 PMCID: PMC10592779 DOI: 10.1101/2023.10.10.561787] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]

Abstract

The demographic history of a population drives the pattern of genetic variation and is encoded in the gene-genealogical trees of the sampled alleles. However, existing methods to infer demographic history from genetic data tend to use relatively low-dimensional summaries of the genealogy, such as allele frequency spectra. As a step toward capturing more of the information encoded in the genome-wide sequence of genealogical trees, here we propose a novel framework called the genealogical likelihood (gLike), which derives the full likelihood of a genealogical tree under any hypothesized demographic history. Employing a graph-based structure, gLike summarizes across independent trees the relationships among all lineages in a tree with all possible trajectories of population memberships through time and efficiently computes the exact marginal probability under a parameterized demographic model. Through extensive simulations and empirical applications on populations that have experienced multiple admixtures, we showed that gLike can accurately estimate dozens of demographic parameters when the true genealogy is known, including ancestral population sizes, admixture timing, and admixture proportions. Moreover, when using genealogical trees inferred from genetic data, we showed that gLike outperformed conventional demographic inference methods that leverage only the allele-frequency spectrum and yielded parameter estimates that align with established historical knowledge of the past demographic histories for populations like Latino Americans and Native Hawaiians. Furthermore, our framework can trace ancestral histories by analyzing a sample from the admixed population without proxies for its source populations, removing the need to sample ancestral populations that may no longer exist. Taken together, our proposed gLike framework harnesses underutilized genealogical information to offer exceptional sensitivity and accuracy in inferring complex demographies for humans and other species, particularly as estimation of genome-wide genealogies improves.

Collapse

Castro LA, Leitner T, Romero-Severson E. Recombination smooths the time signal disrupted by latency in within-host HIV phylogenies. Virus Evol 2023;9:vead032. [PMID: 37397911 PMCID: PMC10313349 DOI: 10.1093/ve/vead032] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/07/2023] [Accepted: 05/15/2023] [Indexed: 07/04/2023] Open

Abstract

Within-host Human immunodeficiency virus (HIV) evolution involves several features that may disrupt standard phylogenetic reconstruction. One important feature is reactivation of latently integrated provirus, which has the potential to disrupt the temporal signal, leading to variation in the branch lengths and apparent evolutionary rates in a tree. Yet, real within-host HIV phylogenies tend to show clear, ladder-like trees structured by the time of sampling. Another important feature is recombination, which violates the fundamental assumption that evolutionary history can be represented by a single bifurcating tree. Thus, recombination complicates the within-host HIV dynamic by mixing genomes and creating evolutionary loop structures that cannot be represented in a bifurcating tree. In this paper, we develop a coalescent-based simulator of within-host HIV evolution that includes latency, recombination, and effective population size dynamics that allows us to study the relationship between the true, complex genealogy of within-host HIV evolution, encoded as an ancestral recombination graph (ARG), and the observed phylogenetic tree. To compare our ARG results to the familiar phylogeny format, we calculate the expected bifurcating tree after decomposing the ARG into all unique site trees, their combined distance matrix, and the overall corresponding bifurcating tree. While latency and recombination separately disrupt the phylogenetic signal, remarkably, we find that recombination recovers the temporal signal of within-host HIV evolution caused by latency by mixing fragments of old, latent genomes into the contemporary population. In effect, recombination averages over extant heterogeneity, whether it stems from mixed time signals or population bottlenecks. Furthermore, we establish that the signals of latency and recombination can be observed in phylogenetic trees despite being an incorrect representation of the true evolutionary history. Using an approximate Bayesian computation method, we develop a set of statistical probes to tune our simulation model to nine longitudinally sampled within-host HIV phylogenies. Because ARGs are exceedingly difficult to infer from real HIV data, our simulation system allows investigating effects of latency, recombination, and population size bottlenecks by matching decomposed ARGs to real data as observed in standard phylogenies.

Collapse

Harris K. Using enormous genealogies to map causal variants in space and time. Nat Genet 2023;55:730-731. [PMID: 37127671 PMCID: PMC10350326 DOI: 10.1038/s41588-023-01389-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CW, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536093. [PMID: 37066144 PMCID: PMC10104234 DOI: 10.1101/2023.04.07.536093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]

Fan C, Mancuso N, Chiang CW. A genealogical estimate of genetic relationships. Am J Hum Genet 2022;109:812-824. [PMID: 35417677 DOI: 10.1016/j.ajhg.2022.03.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 03/25/2022] [Indexed: 12/23/2022] Open

Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022;31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]

Mahmoudi A, Koskela J, Kelleher J, Chan YB, Balding D. Bayesian inference of ancestral recombination graphs. PLoS Comput Biol 2022;18:e1009960. [PMID: 35263345 PMCID: PMC8936483 DOI: 10.1371/journal.pcbi.1009960] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 03/21/2022] [Accepted: 02/23/2022] [Indexed: 11/18/2022] Open

Kreiner JM, Sandler G, Stern AJ, Tranel PJ, Weigel D, Stinchcombe J, Wright SI. Repeated origins, widespread gene flow, and allelic interactions of target-site herbicide resistance mutations. eLife 2022;11:70242. [PMID: 35037853 PMCID: PMC8798060 DOI: 10.7554/elife.70242] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 01/16/2022] [Indexed: 11/13/2022] Open

Hejase HA, Mo Z, Campagna L, Siepel A. A Deep-Learning Approach for Inference of Selective Sweeps from the Ancestral Recombination Graph. Mol Biol Evol 2022;39:msab332. [PMID: 34888675 PMCID: PMC8789311 DOI: 10.1093/molbev/msab332] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Nadachowska‐Brzyska K, Konczal M, Babik W. Navigating the temporal continuum of effective population size. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13740] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Alberti F, Herrmann C, Baake E. Selection, recombination, and the ancestral initiation graph. Theor Popul Biol 2021;142:46-56. [PMID: 34520824 DOI: 10.1016/j.tpb.2021.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 08/17/2021] [Accepted: 08/25/2021] [Indexed: 11/20/2022]

Schaefer NK, Shapiro B, Green RE. An ancestral recombination graph of human, Neanderthal, and Denisovan genomes. SCIENCE ADVANCES 2021;7:eabc0776. [PMID: 34272242 PMCID: PMC8284891 DOI: 10.1126/sciadv.abc0776] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 06/03/2021] [Indexed: 05/02/2023]

Deng Y, Song YS, Nielsen R. The distribution of waiting distances in ancestral recombination graphs. Theor Popul Biol 2021;141:34-43. [PMID: 34186053 DOI: 10.1016/j.tpb.2021.06.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 06/11/2021] [Accepted: 06/16/2021] [Indexed: 11/25/2022]

Alberti F, Baake E, Letter I, Martínez S. Solving the migration-recombination equation from a genealogical point of view. J Math Biol 2021;82:41. [PMID: 33774735 PMCID: PMC8004498 DOI: 10.1007/s00285-021-01584-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 10/08/2020] [Accepted: 02/14/2021] [Indexed: 11/29/2022]

Korunes KL, Goldberg A. Human genetic admixture. PLoS Genet 2021;17:e1009374. [PMID: 33705374 PMCID: PMC7951803 DOI: 10.1371/journal.pgen.1009374] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Stern AJ, Speidel L, Zaitlen NA, Nielsen R. Disentangling selection on genetically correlated polygenic traits via whole-genome genealogies. Am J Hum Genet 2021;108:219-239. [PMID: 33440170 PMCID: PMC7895848 DOI: 10.1016/j.ajhg.2020.12.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 12/07/2020] [Indexed: 12/17/2022] Open

Dilthey AT. State-of-the-art genome inference in the human MHC. Int J Biochem Cell Biol 2021;131:105882. [PMID: 33189874 DOI: 10.1016/j.biocel.2020.105882] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 10/29/2020] [Accepted: 11/04/2020] [Indexed: 12/20/2022]

Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet 2020;16:e1008895. [PMID: 32760067 PMCID: PMC7410169 DOI: 10.1371/journal.pgen.1008895] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 05/29/2020] [Indexed: 01/09/2023] Open

Abstract

The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these “super-archaic” regions—comprising at least about 4Mb—were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today.

We present ARGweaver-D, an extension of the ARGweaver algorithm which can be applied under a user-defined demographic model including population splits and migration events. Given genome sequence data from a collection of individuals across multiple closely related populations or subspecies, ARGweaver-D can infer trees describing the genetic relationships among these individuals at every location along the genome, conditional on the demographic model. Like ARGweaver, ARGweaver-D is a Bayesian method, sampling trees from the posterior distribution in order to account for uncertainty. Using simulations, we show that ARGweaver-D can successfully identify regions introgressed from Neanderthals and Denisovans into modern humans. It is also well-powered to detect introgressed regions stemming from older gene-flow events. We apply ARGweaver-D to the genomes of two Neanderthals, a Denisovan, and two African humans. We identify 3% of the Neanderthal genome which is likely derived from gene flow from ancient humans. We also identify about 1% of the Denisovan genome that may be traced to an unsequenced archaic hominin; 15% of these regions were subsequently passed to modern humans. We find no convincing evidence that selection acted against any of these introgressed regions.

Collapse

Ralph P, Thornton K, Kelleher J. Efficiently Summarizing Relationships in Large Samples: A General Duality Between Statistics of Genealogies and Genomes. Genetics 2020;215:779-797. [PMID: 32357960 PMCID: PMC7337078 DOI: 10.1534/genetics.120.303253] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 04/28/2020] [Indexed: 12/11/2022] Open

Abstract

As a genetic mutation is passed down across generations, it distinguishes those genomes that have inherited it from those that have not, providing a glimpse of the genealogical tree relating the genomes to each other at that site. Statistical summaries of genetic variation therefore also describe the underlying genealogies. We use this correspondence to define a general framework that efficiently computes single-site population genetic statistics using the succinct tree sequence encoding of genealogies and genome sequence. The general approach accumulates sample weights within the genealogical tree at each position on the genome, which are then combined using a summary function; different statistics result from different choices of weight and function. Results can be reported in three ways: by site, which corresponds to statistics calculated as usual from genome sequence; by branch, which gives the expected value of the dual site statistic under the infinite sites model of mutation, and by node, which summarizes the contribution of each ancestor to these statistics. We use the framework to implement many currently defined statistics of genome sequence (making the statistics' relationship to the underlying genealogical trees concrete and explicit), as well as the corresponding branch statistics of tree shape. We evaluate computational performance using simulated data, and show that calculating statistics from tree sequences using this general framework is several orders of magnitude more efficient than optimized matrix-based methods in terms of both run time and memory requirements. We also explore how well the duality between site and branch statistics holds in practice on trees inferred from the 1000 Genomes Project data set, and discuss ways in which deviations may encode interesting biological signals.

Collapse

Gagnaire PA. Comparative genomics approach to evolutionary process connectivity. Evol Appl 2020;13:1320-1334. [PMID: 32684961 PMCID: PMC7359831 DOI: 10.1111/eva.12978] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 04/02/2020] [Accepted: 04/03/2020] [Indexed: 01/01/2023] Open

Abstract

The influence of species life history traits and historical demography on contemporary connectivity is still poorly understood. However, these factors partly determine the evolutionary responses of species to anthropogenic landscape alterations. Genetic connectivity and its evolutionary outcomes depend on a variety of spatially dependent evolutionary processes, such as population structure, local adaptation, genetic admixture, and speciation. Over the last years, population genomic studies have been interrogating these processes with increasing resolution, revealing a large diversity of species responses to spatially structured landscapes. In parallel, multispecies meta-analyses usually based on low-genome coverage data have provided fundamental insights into the ecological determinants of genetic connectivity, such as the influence of key life history traits on population structure. However, comparative studies still lack a thorough integration of macro- and micro-evolutionary scales to fully realize their potential. Here, I present how a comparative genomics framework may provide a deeper understanding of evolutionary process connectivity. This framework relies on coupling the inference of long-term demographic and selective history with an assessment of the contemporary consequences of genetic connectivity. Standardizing this approach across several species occupying the same landscape should help understand how spatial environmental heterogeneity has shaped the diversity of historical and contemporary connectivity patterns in different taxa with contrasted life history traits. I will argue that a reasonable amount of genome sequence data can be sufficient to resolve and connect complex macro- and micro-evolutionary histories. Ultimately, implementing this framework in varied taxonomic groups is expected to improve scientific guidelines for conservation and management policies.

Collapse

Hejase HA, Dukler N, Siepel A. From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection. Trends Genet 2020;36:243-258. [PMID: 31954511 PMCID: PMC7177178 DOI: 10.1016/j.tig.2019.12.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 11/15/2019] [Accepted: 12/11/2019] [Indexed: 01/01/2023]

Harris K. From a database of genomes to a forest of evolutionary trees. Nat Genet 2020;51:1306-1307. [PMID: 31477932 DOI: 10.1038/s41588-019-0492-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

V. Barroso G, Puzović N, Dutheil JY. Inference of recombination maps from a single pair of genomes and its application to ancient samples. PLoS Genet 2019;15:e1008449. [PMID: 31725722 PMCID: PMC6879166 DOI: 10.1371/journal.pgen.1008449] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 11/26/2019] [Accepted: 09/30/2019] [Indexed: 12/11/2022] Open

Inferring whole-genome histories in large population datasets. Nat Genet 2019;51:1330-1338. [PMID: 31477934 PMCID: PMC6726478 DOI: 10.1038/s41588-019-0483-y] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/15/2019] [Indexed: 01/01/2023]

A method for genome-wide genealogy estimation for thousands of samples. Nat Genet 2019;51:1321-1329. [PMID: 31477933 DOI: 10.1038/s41588-019-0484-x] [Citation(s) in RCA: 207] [Impact Index Per Article: 41.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/15/2019] [Indexed: 01/29/2023]

Liao KH, Hon WK, Tang CY, Hsieh WP. MetaSMC: a coalescent-based shotgun sequence simulator for evolving microbial populations. Bioinformatics 2019;35:1677-1685. [PMID: 30321266 DOI: 10.1093/bioinformatics/bty840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 09/09/2018] [Accepted: 10/11/2018] [Indexed: 01/26/2023] Open

Fractional coalescent. Proc Natl Acad Sci U S A 2019;116:6244-6249. [PMID: 30867282 PMCID: PMC6442577 DOI: 10.1073/pnas.1810239116] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Abstract

The fractional coalescent is a generalization of Kingman’s n-coalescent. It facilitates the development of the theory of population genetic processes that deviate from Poisson-distributed waiting times. It also marks the use of methods developed in fractional calculus in population genetics. The fractional coalescent is an extension of Canning’s model, where the variance of the number of offspring per parent is a random variable. The distribution of the number of offspring depends on a parameter α, which is a potential measure of the environmental heterogeneity that is commonly ignored in current inferences.

An approach to the coalescent, the fractional coalescent (f-coalescent), is introduced. The derivation is based on the discrete-time Cannings population model in which the variance of the number of offspring depends on the parameter α. This additional parameter α affects the variability of the patterns of the waiting times; values of α<1 lead to an increase of short time intervals, but occasionally allow for very long time intervals. When α=1, the f-coalescent and the Kingman’s n-coalescent are equivalent. The distribution of the time to the most recent common ancestor and the probability that n genes descend from m ancestral genes in a time interval of length T for the f-coalescent are derived. The f-coalescent has been implemented in the population genetic model inference software Migrate. Simulation studies suggest that it is possible to accurately estimate α values from data that were generated with known α values and that the f-coalescent can detect potential environmental heterogeneity within a population. Bayes factor comparisons of simulated data with α<1 and real data (H1N1 influenza and malaria parasites) showed an improved model fit of the f-coalescent over the n-coalescent. The development of the f-coalescent and its inclusion into the inference program Migrate facilitates testing for deviations from the n-coalescent.

Collapse

Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization. BIOINFORMATICS AND PHYLOGENETICS 2019. [DOI: 10.1007/978-3-030-10837-3_13] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Spence JP, Steinrücken M, Terhorst J, Song YS. Inference of population history using coalescent HMMs: review and outlook. Curr Opin Genet Dev 2018;53:70-76. [PMID: 30056275 PMCID: PMC6296859 DOI: 10.1016/j.gde.2018.07.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 07/08/2018] [Accepted: 07/09/2018] [Indexed: 01/02/2023]

Heine K, Beskos A, Jasra A, Balding D, De Iorio M. Bridging trees for posterior inference on ancestral recombination graphs. Proc Math Phys Eng Sci 2018;474:20180568. [PMID: 30602937 PMCID: PMC6304023 DOI: 10.1098/rspa.2018.0568] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/01/2018] [Indexed: 11/08/2023] Open

Akita T, Takuno S, Innan H. Coalescent framework for prokaryotes undergoing interspecific homologous recombination. Heredity (Edinb) 2018;120:474-484. [PMID: 29358726 PMCID: PMC5889408 DOI: 10.1038/s41437-017-0034-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 10/04/2017] [Accepted: 10/23/2017] [Indexed: 12/11/2022] Open

Koskela J, Jenkins P, Spanò D. Computational Inference Beyond Kingman's Coalescent. J Appl Probab 2018. [DOI: 10.1239/jap/1437658613] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Miroshnikov A, Steinrücken M. Computing the joint distribution of the total tree length across loci in populations with variable size. Theor Popul Biol 2017;118:1-19. [PMID: 28943126 PMCID: PMC5705476 DOI: 10.1016/j.tpb.2017.09.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 09/08/2017] [Accepted: 09/13/2017] [Indexed: 11/26/2022]

Kabisch M, Hamann U, Lorenzo Bermejo J. Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure. BMC Genomics 2017;18:798. [PMID: 29041903 PMCID: PMC5646149 DOI: 10.1186/s12864-017-4208-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 10/12/2017] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Genotypes not directly measured in genetic studies are often imputed to improve statistical power and to increase mapping resolution. The accuracy of standard imputation techniques strongly depends on the similarity of linkage disequilibrium (LD) patterns in the study and reference populations. Here we develop a novel approach for genotype imputation in low-recombination regions that relies on the coalescent and permits to explicitly account for population demographic factors. To test the new method, study and reference haplotypes were simulated and gene trees were inferred under the basic coalescent and also considering population growth and structure. The reference haplotypes that first coalesced with study haplotypes were used as templates for genotype imputation. Computer simulations were complemented with the analysis of real data. Genotype concordance rates were used to compare the accuracies of coalescent-based and standard (IMPUTE2) imputation.

RESULTS

Simulations revealed that, in LD-blocks, imputation accuracy relying on the basic coalescent was higher and less variable than with IMPUTE2. Explicit consideration of population growth and structure, even if present, did not practically improve accuracy. The advantage of coalescent-based over standard imputation increased with the minor allele frequency and it decreased with population stratification. Results based on real data indicated that, even in low-recombination regions, further research is needed to incorporate recombination in coalescence inference, in particular for studies with genetically diverse and admixed individuals.

CONCLUSIONS

To exploit the full potential of coalescent-based methods for the imputation of missing genotypes in genetic studies, further methodological research is needed to reduce computer time, to take into account recombination, and to implement these methods in user-friendly computer programs. Here we provide reproducible code which takes advantage of publicly available software to facilitate further developments in the field.

Collapse

Mirzaei S, Wu Y. RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination. Bioinformatics 2017;33:1021-1030. [PMID: 28065901 PMCID: PMC5860023 DOI: 10.1093/bioinformatics/btw735] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Revised: 11/11/2016] [Accepted: 12/19/2016] [Indexed: 11/13/2022] Open

Matsieva J, Kelk S, Scornavacca C, Whidden C, Gusfield D. A Resolution of the Static Formulation Question for the Problem of Computing the History Bound. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017;14:404-417. [PMID: 26887004 DOI: 10.1109/tcbb.2016.2527645] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Polanski A, Szczesna A, Garbulowski M, Kimmel M. Coalescence computations for large samples drawn from populations of time-varying sizes. PLoS One 2017;12:e0170701. [PMID: 28170404 PMCID: PMC5295683 DOI: 10.1371/journal.pone.0170701] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 01/09/2017] [Indexed: 11/19/2022] Open

Peever T, Barve M, Stone L, Kaiser W. Evolutionary relationships among Ascochyta species infecting wild and cultivated hosts in the legume tribes Cicereae and Vicieae. Mycologia 2017. [DOI: 10.1080/15572536.2007.11832601] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Wang RJ, Gray MM, Parmenter MD, Broman KW, Payseur BA. Recombination rate variation in mice from an isolated island. Mol Ecol 2016;26:457-470. [PMID: 27864900 DOI: 10.1111/mec.13932] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 11/09/2016] [Accepted: 11/14/2016] [Indexed: 01/14/2023]

Inferring Past Effective Population Size from Distributions of Coalescent Times. Genetics 2016;204:1191-1206. [PMID: 27638421 PMCID: PMC5105851 DOI: 10.1534/genetics.115.185058] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 07/20/2016] [Indexed: 01/19/2023] Open

Abstract

Inferring and understanding changes in effective population size over time is a major challenge for population genetics. Here we investigate some theoretical properties of random-mating populations with varying size over time. In particular, we present an exact solution to compute the population size as a function of time, [Formula: see text], based on distributions of coalescent times of samples of any size. This result reduces the problem of population size inference to a problem of estimating coalescent time distributions. To illustrate the analytic results, we design a heuristic method using a tree-inference algorithm and investigate simulated and empirical population-genetic data. We investigate the effects of a range of conditions associated with empirical data, for instance number of loci, sample size, mutation rate, and cryptic recombination. We show that our approach performs well with genomic data (≥ 10,000 loci) and that increasing the sample size from 2 to 10 greatly improves the inference of [Formula: see text] whereas further increase in sample size results in modest improvements, even under a scenario of exponential growth. We also investigate the impact of recombination and characterize the potential biases in inference of [Formula: see text] The approach can handle large sample sizes and the computations are fast. We apply our method to human genomes from four populations and reconstruct population size profiles that are coherent with previous finds, including the Out-of-Africa bottleneck. Additionally, we uncover a potential difference in population size between African and non-African populations as early as 400 KYA. In summary, we provide an analytic relationship between distributions of coalescent times and [Formula: see text], which can be incorporated into powerful approaches for inferring past population sizes from population-genomic data.

Collapse

A coalescent dual process for a Wright-Fisher diffusion with recombination and its application to haplotype partitioning. Theor Popul Biol 2016;112:126-138. [PMID: 27594345 DOI: 10.1016/j.tpb.2016.08.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 08/19/2016] [Accepted: 08/25/2016] [Indexed: 11/24/2022]

Abstract

Duality plays an important role in population genetics. It can relate results from forwards-in-time models of allele frequency evolution with those of backwards-in-time genealogical models; a well known example is the duality between the Wright-Fisher diffusion for genetic drift and its genealogical counterpart, the coalescent. There have been a number of articles extending this relationship to include other evolutionary processes such as mutation and selection, but little has been explored for models also incorporating crossover recombination. Here, we derive from first principles a new genealogical process which is dual to a Wright-Fisher diffusion model of drift, mutation, and recombination. The process is reminiscent of the ancestral recombination graph, a widely-used multilocus genealogical model, but here ancestral lineages are typed and transition rates are regarded as being conditioned on an observed configuration at the leaves of the genealogy. Our approach is based on expressing a putative duality relationship between two models via their infinitesimal generators, and then seeking an appropriate test function to ensure the validity of the duality equation. This approach is quite general, and we use it to find dualities for several important variants, including both a discrete L-locus model of a gene and a continuous model in which mutation and recombination events are scattered along the gene according to continuous distributions. As an application of our results, we derive a series expansion for the transition function of the diffusion. Finally, we study in further detail the case in which mutation is absent. Then the dual process describes the dispersal of ancestral genetic material across the ancestors of a sample. The stationary distribution of this process is of particular interest; we show how duality relates this distribution to haplotype fixation probabilities. We develop an efficient method for computing such probabilities in multilocus models.

Collapse

Inference of Ancestral Recombination Graphs through Topological Data Analysis. PLoS Comput Biol 2016;12:e1005071. [PMID: 27532298 PMCID: PMC4988722 DOI: 10.1371/journal.pcbi.1005071] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2016] [Accepted: 07/20/2016] [Indexed: 12/30/2022] Open