Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lunter G. Haplotype matching in large cohorts using the Li and Stephens model. Bioinformatics 2019;35:798-806. [PMID: 30165547 PMCID: PMC6394399 DOI: 10.1093/bioinformatics/bty735] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 05/16/2018] [Accepted: 08/23/2018] [Indexed: 12/28/2022] Open

For:	Lunter G. Haplotype matching in large cohorts using the Li and Stephens model. Bioinformatics 2019;35:798-806. [PMID: 30165547 PMCID: PMC6394399 DOI: 10.1093/bioinformatics/bty735] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 05/16/2018] [Accepted: 08/23/2018] [Indexed: 12/28/2022] Open

Number

Cited by Other Article(s)

Emani PS, Geradi MN, Gürsoy G, Grasty MR, Miranker A, Gerstein MB. Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases. Genome Res 2023;33:gr.278322.123. [PMID: 38097386 PMCID: PMC10760520 DOI: 10.1101/gr.278322.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/18/2023] [Indexed: 01/04/2024]

Jin Y, Terhorst J. The solution surface of the Li-Stephens haplotype copying model. Algorithms Mol Biol 2023;18:12. [PMID: 37559098 PMCID: PMC10410957 DOI: 10.1186/s13015-023-00237-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/30/2023] [Indexed: 08/11/2023] Open

Sanaullah A, Zhi D, Zhang S. Minimal positional substring cover is a haplotype threading alternative to Li and Stephens model. Genome Res 2023;33:1007-1014. [PMID: 37316352 PMCID: PMC10538481 DOI: 10.1101/gr.277673.123] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/06/2023] [Indexed: 06/16/2023]

Sanaullah A, Zhi D, Zhang S. Minimal Positional Substring Cover: A Haplotype Threading Alternative to Li & Stephens Model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.04.522803. [PMID: 36711469 PMCID: PMC9881975 DOI: 10.1101/2023.01.04.522803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

The Li & Stephens (LS) hidden Markov model (HMM) models the process of reconstructing a haplotype as a mosaic copy of haplotypes in a reference panel (haplotype threading). For small panels the probabilistic parameterization of LS enables modeling the uncertainties of such mosaics, and has been the foundational model for haplotype phasing and imputation. However, LS becomes inefficient when sample size is large (tens of thousands to millions), because of its linear time complexity ( O ( MN ), where M is the number of haplotypes and N is the number of sites in the panel). Recently the PBWT, an efficient data structure capturing the local haplotype matching among haplotypes, was proposed to offer fast methods for giving some optimal solution (Viterbi) to the LS HMM. But the solution space of the LS for large panels is still elusive. Previously we introduced the Minimal Positional Substring Cover (MPSC) problem as an alternative formulation of LS whose objective is to cover a query haplotype by a minimum number of segments from haplotypes in a reference panel. The MPSC formulation allows the generation of a haplotype threading in time constant to sample size ( O ( N )). This allows haplotype threading on very large biobank scale panels on which the LS model is infeasible. Here we present new results on the solution space of the MPSC by first identifying a property that any MPSC will have a set of required regions, and then proposing a MPSC graph. In addition, we derived a number of optimal algorithms for MPSC, including solution enumerations, the Length Maximal MPSC, and h -MPSC solutions. In doing so, our algorithms reveal the solution space of LS for large panels. Even though we only solved an extreme case of LS where the emission probability is 0, our algorithms can be made more robust by PBWT smoothing. We show that our method is informative in terms of revealing the characteristics of biobank-scale data sets and can improve genotype imputation.

Collapse

De Marino A, Mahmoud AA, Bose M, Bircan KO, Terpolovsky A, Bamunusinghe V, Bohn S, Khan U, Novković B, Yazdi PG. A comparative analysis of current phasing and imputation software. PLoS One 2022;17:e0260177. [PMID: 36260643 PMCID: PMC9581364 DOI: 10.1371/journal.pone.0260177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 09/01/2022] [Indexed: 12/02/2022] Open

Huang Y, Ringbauer H. hapCon: Estimating Contamination of Ancient Genomes by Copying from Reference Haplotypes. Bioinformatics 2022;38:3768-3777. [PMID: 35695771 PMCID: PMC9344841 DOI: 10.1093/bioinformatics/btac390] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 06/03/2022] [Accepted: 06/09/2022] [Indexed: 11/14/2022] Open

Abstract

Motivation

Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1× average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1× and below) remains a challenging task for existing approaches.

Results

We present a new method to estimate contamination in aDNA for male modern humans. It utilizes a Li&Stephens haplotype copying model for haploid X chromosomes, with mismatches modeled as errors or contamination. We assessed this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our experiments demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1× for SNP capture data (1240k) and 0.02× for whole genome sequencing data, substantially extending the coverage limit of previous male X chromosome-based contamination estimation methods. Our experiments demonstrate that hapCon has little bias for contamination up to 25–30% as long as the contaminating source is specified within continental genetic variation, and that its application range extends to human aDNA as old as ∼45 000 and various global ancestries.

Availability and implementation

We make hapCon available as part of a python package (hapROH), which is available at the Python Package Index (https://pypi.org/project/hapROH) and can be installed via pip. The documentation provides example use cases as blueprints for custom applications (https://haproh.readthedocs.io/en/latest/hapCon.html). The program can analyze either BAM files or pileup files produced with samtools. An implementation of our software (hapCon) using Python and C is deposited at https://github.com/hyl317/hapROH.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Sanaullah A, Zhi D, Zhang S. d-PBWT: dynamic positional Burrows-Wheeler transform. Bioinformatics 2021;37:2390-2397. [PMID: 33624749 DOI: 10.1093/bioinformatics/btab117] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 01/31/2021] [Accepted: 02/23/2021] [Indexed: 11/13/2022] Open

Alanko J, Bannai H, Cazaux B, Peterlongo P, Stoye J. Finding all maximal perfect haplotype blocks in linear time. Algorithms Mol Biol 2020;15:2. [PMID: 32055252 PMCID: PMC7008532 DOI: 10.1186/s13015-020-0163-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 01/28/2020] [Indexed: 11/10/2022] Open

Inferring whole-genome histories in large population datasets. Nat Genet 2019;51:1330-1338. [PMID: 31477934 PMCID: PMC6726478 DOI: 10.1038/s41588-019-0483-y] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/15/2019] [Indexed: 01/01/2023]

Novak AM, Garrison E, Paten B. A graph extension of the positional Burrows-Wheeler transform and its applications. Algorithms Mol Biol 2017;12:18. [PMID: 28702075 PMCID: PMC5505026 DOI: 10.1186/s13015-017-0109-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 06/17/2017] [Indexed: 01/23/2023] Open

Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016;41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]