Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Paul JS, Song YS. Blockwise HMM computation for large-scale population genomic inference. ACTA ACUST UNITED AC 2012;28:2008-15. [PMID: 22641715 DOI: 10.1093/bioinformatics/bts314] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Paul JS, Song YS. Blockwise HMM computation for large-scale population genomic inference. ACTA ACUST UNITED AC 2012;28:2008-15. [PMID: 22641715 DOI: 10.1093/bioinformatics/bts314] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the discrete-time Wright-Fisher model to biobank-scale datasets. Genetics 2023;225:iyad168. [PMID: 37724741 PMCID: PMC10627256 DOI: 10.1093/genetics/iyad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/01/2023] [Accepted: 09/08/2023] [Indexed: 09/21/2023] Open

Spence JP, Zeng T, Mostafavi H, Pritchard JK. Scaling the Discrete-time Wright Fisher model to biobank-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541517. [PMID: 37293115 PMCID: PMC10245735 DOI: 10.1101/2023.05.19.541517] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.

Collapse

Dokmai N, Kockan C, Zhu K, Wang X, Sahinalp SC, Cho H. Privacy-preserving genotype imputation in a trusted execution environment. Cell Syst 2021;12:983-993.e7. [PMID: 34450045 PMCID: PMC8542641 DOI: 10.1016/j.cels.2021.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/14/2021] [Accepted: 08/02/2021] [Indexed: 01/02/2023]

Zheng C, Tan L, Sang M, Ye M, Wu R. Genetic adaptation of Tibetan poplar (Populus szechuanica var. tibetica) to high altitudes on the Qinghai-Tibetan Plateau. Ecol Evol 2020;10:10974-10985. [PMID: 33144942 PMCID: PMC7593140 DOI: 10.1002/ece3.6508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 05/14/2020] [Accepted: 05/28/2020] [Indexed: 12/26/2022] Open

Steinrücken M, Kamm J, Spence JP, Song YS. Inference of complex population histories using whole-genome sequences from multiple populations. Proc Natl Acad Sci U S A 2019;116:17115-17120. [PMID: 31387977 PMCID: PMC6708337 DOI: 10.1073/pnas.1905060116] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Spence JP, Steinrücken M, Terhorst J, Song YS. Inference of population history using coalescent HMMs: review and outlook. Curr Opin Genet Dev 2018;53:70-76. [PMID: 30056275 PMCID: PMC6296859 DOI: 10.1016/j.gde.2018.07.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 07/08/2018] [Accepted: 07/09/2018] [Indexed: 01/02/2023]

Hidden Markov Models in Population Genomics. Methods Mol Biol 2018;1552:149-164. [PMID: 28224497 DOI: 10.1007/978-1-4939-6753-7_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Landscape Genomics: Understanding Relationships Between Environmental Heterogeneity and Genomic Characteristics of Populations. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/13836_2017_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Terhorst J, Kamm JA, Song YS. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 2017;49:303-309. [PMID: 28024154 PMCID: PMC5470542 DOI: 10.1038/ng.3748] [Citation(s) in RCA: 397] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 11/23/2016] [Indexed: 12/20/2022]

Next-generation genotype imputation service and methods. Nat Genet 2016;48:1284-1287. [PMID: 27571263 DOI: 10.1038/ng.3656] [Citation(s) in RCA: 2337] [Impact Index Per Article: 292.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 08/02/2016] [Indexed: 02/07/2023]

Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 2013;194:647-62. [PMID: 23608192 DOI: 10.1534/genetics.112.149096] [Citation(s) in RCA: 124] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.

Collapse

Steinrücken M, Paul JS, Song YS. A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor Popul Biol 2012;87:51-61. [PMID: 23010245 DOI: 10.1016/j.tpb.2012.08.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 08/20/2012] [Accepted: 08/28/2012] [Indexed: 10/27/2022]