Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Al Bkhetan Z, Zobel J, Kowalczyk A, Verspoor K, Goudey B. Exploring effective approaches for haplotype block phasing. BMC Bioinformatics 2019;20:540. [PMID: 31666002 PMCID: PMC6822470 DOI: 10.1186/s12859-019-3095-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 09/10/2019] [Indexed: 01/19/2023] Open

For:	Al Bkhetan Z, Zobel J, Kowalczyk A, Verspoor K, Goudey B. Exploring effective approaches for haplotype block phasing. BMC Bioinformatics 2019;20:540. [PMID: 31666002 PMCID: PMC6822470 DOI: 10.1186/s12859-019-3095-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 09/10/2019] [Indexed: 01/19/2023] Open

Number

Cited by Other Article(s)

Sivabharathi RC, Rajagopalan VR, Suresh R, Sudha M, Karthikeyan G, Jayakanthan M, Raveendran M. Haplotype-based breeding: A new insight in crop improvement. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2024;346:112129. [PMID: 38763472 DOI: 10.1016/j.plantsci.2024.112129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/09/2024] [Accepted: 05/15/2024] [Indexed: 05/21/2024]

Montero-Tena JA, Abdollahi Sisi N, Kox T, Abbadi A, Snowdon RJ, Golicz AA. haploMAGIC: accurate phasing and detection of recombination in multiparental populations despite genotyping errors. G3 (BETHESDA, MD.) 2024;14:jkae109. [PMID: 38808682 PMCID: PMC11304941 DOI: 10.1093/g3journal/jkae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 02/12/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]

van der Burg LLJ, de Wreede LC, Baldauf H, Sauter J, Schetelig J, Putter H, Böhringer S. Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region. Genet Epidemiol 2024;48:3-26. [PMID: 37830494 DOI: 10.1002/gepi.22538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 09/06/2023] [Accepted: 09/25/2023] [Indexed: 10/14/2023]

Abstract

Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation-maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.

Collapse

Wragg D, Zhang W, Peterson S, Yerramilli M, Mellanby R, Schoenebeck JJ, Clements DN. A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy. Genet Sel Evol 2024;56:6. [PMID: 38216889 PMCID: PMC10785484 DOI: 10.1186/s12711-024-00875-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/03/2024] [Indexed: 01/14/2024] Open

Abstract

BACKGROUND

Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses.

RESULTS

DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths.

CONCLUSIONS

We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (> 20X).

Collapse

Herzig AF, Velo-Suárez L, Dina C, Redon R, Deleuze JF, Génin E. How local reference panels improve imputation in French populations. Sci Rep 2024;14:370. [PMID: 38172507 PMCID: PMC10764714 DOI: 10.1038/s41598-023-49931-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open

Abstract

Imputation servers offer the exclusive possibility to harness the largest public reference panels which have been shown to deliver very high precision in the imputation of European genomes. Many studies have nonetheless stressed the importance of 'study specific panels' (SSPs) as an alternative and have shown the benefits of combining public reference panels with SSPs. But such combined approaches are not attainable when using external imputation servers. To investigate how to confront this challenge, we imputed 550 French individuals using either the University of Michigan imputation server with the Haplotype Reference Consortium (HRC) panel or an in-house SSP of 850 whole-genome sequenced French individuals. With approximate geo-localization of both our target and SSP individuals we are able to pinpoint different scenarios where SSP-based imputation would be preferred over server-based imputation or vice-versa. This is achieved by showing to a high degree of resolution the importance of the proximity of the reference panel to target individuals; with a focus on the clear added value of SSPs for estimating haplotype phase and for the imputation of rare variants (minor allele-frequency below 0.01). Such benefits were most evident for individuals from the same geographical regions in France as the SSP individuals. Overall, only 42.3% of all 125,442 variants evaluated were better imputed with an SSP from France compared to an external reference panel, however this rises to 58.1% for individuals from geographic regions well covered by the SSP. By investigating haplotype sharing and population fine-structure in France, we show the importance of including SSP haplotypes for imputation but also that they should ideally be combined with large public panels. In the absence of the unattainable results from a combined panel of the HRC and our French SSP, we put forward a pragmatic solution where server-based and SSP-based imputation outcomes can be combined based on comparing posterior genotype probabilities. We show that such an approach can give a level of imputation accuracy in excess of what could be achieved with either strategy alone. The results presented provide detailed insights into the accuracy of imputation that should be expected from different strategies for European populations.

Collapse

Xiao N, Cao X, Liu Z, Han Y. Two germline mutations can serve as genetic susceptibility screening makers for a lung adenocarcinoma family. J Cancer Res Clin Oncol 2023;149:6541-6548. [PMID: 36781503 DOI: 10.1007/s00432-023-04616-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/27/2023] [Indexed: 02/15/2023]

Shipilina D, Pal A, Stankowski S, Chan YF, Barton NH. On the origin and structure of haplotype blocks. Mol Ecol 2023;32:1441-1457. [PMID: 36433653 PMCID: PMC10946714 DOI: 10.1111/mec.16793] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 11/16/2022] [Accepted: 11/18/2022] [Indexed: 11/27/2022]

Liu D, Peter BM, Schiefenhövel W, Kayser M, Stoneking M. Assessing human genome-wide variation in the Massim region of Papua New Guinea and implications for the Kula trading tradition. Mol Biol Evol 2022;39:6653776. [PMID: 35920169 PMCID: PMC9372566 DOI: 10.1093/molbev/msac165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Jighly A. When do autopolyploids need poly-sequencing data? Mol Ecol 2021;31:1021-1027. [PMID: 34875138 DOI: 10.1111/mec.16313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 11/23/2021] [Accepted: 12/01/2021] [Indexed: 12/17/2022]

Bhat JA, Yu D, Bohra A, Ganie SA, Varshney RK. Features and applications of haplotypes in crop breeding. Commun Biol 2021;4:1266. [PMID: 34737387 PMCID: PMC8568931 DOI: 10.1038/s42003-021-02782-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 10/09/2021] [Indexed: 12/17/2022] Open

Kulski JK, Suzuki S, Shiina T. Haplotype Shuffling and Dimorphic Transposable Elements in the Human Extended Major Histocompatibility Complex Class II Region. Front Genet 2021;12:665899. [PMID: 34122517 PMCID: PMC8193847 DOI: 10.3389/fgene.2021.665899] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/12/2021] [Indexed: 12/26/2022] Open

Abstract

The major histocompatibility complex (MHC) on chromosome 6p21 is one of the most single-nucleotide polymorphism (SNP)-dense regions of the human genome and a prime model for the study and understanding of conserved sequence polymorphisms and structural diversity of ancestral haplotypes/conserved extended haplotypes. This study aimed to follow up on a previous analysis of the MHC class I region by using the same set of 95 MHC haplotype sequences downloaded from a publicly available BioProject database at the National Center for Biotechnology Information to identify and characterize the polymorphic human leukocyte antigen (HLA)-class II genes, the MTCO3P1 pseudogene alleles, the indels of transposable elements as haplotypic lineage markers, and SNP-density crossover (XO) loci at haplotype junctions in DNA sequence alignments of different haplotypes across the extended class II region (∼1 Mb) from the telomeric PRRT1 gene in class III to the COL11A2 gene at the centromeric end of class II. We identified 42 haplotypic indels (20 Alu, 7 SVA, 13 LTR or MERs, and 2 indels composed of a mosaic of different transposable elements) linked to particular HLA-class II alleles. Comparative sequence analyses of 136 haplotype pairs revealed 98 unique XO sites between SNP-poor and SNP-rich genomic segments with considerable haplotype shuffling located in the proximity of putative recombination hotspots. The majority of XO sites occurred across various regions including in the vicinity of MTCO3P1 between HLA-DQB1 and HLA-DQB3, between HLA-DQB2 and HLA-DOB, between DOB and TAP2, and between HLA-DOA and HLA-DPA1, where most XOs were within a HERVK22 sequence. We also determined the genomic positions of the PRDM9-recombination suppression sequence motif ATCCATG/CATGGAT and the PRDM9 recombination activation partial binding motif CCTCCCCT/AGGGGAG in the class II region of the human reference genome (NC_ 000006) relative to published meiotic recombination positions. Both the recombination and anti-recombination PRDM9 binding motifs were widely distributed throughout the class II genomic regions with 50% or more found within repeat elements; the anti-recombination motifs were found mostly in L1 fragmented repeats. This study shows substantial haplotype shuffling between different polymorphic blocks and confirms the presence of numerous putative ancestral recombination sites across the class II region between various HLA class II genes.

Collapse

Srivastava K, Fratzscher AS, Lan B, Flegel WA. Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database. BMC Bioinformatics 2021;22:273. [PMID: 34039276 PMCID: PMC8150616 DOI: 10.1186/s12859-021-04169-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 05/04/2021] [Indexed: 12/18/2022] Open

Abstract

Background

Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available.

Study design and methods

Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm.

Results

We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing.

Conclusions

Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04169-6.

Collapse

Al Bkhetan Z, Chana G, Soon Ong C, Goudey B, Ramamohanarao K. eQTLHap: a tool for comprehensive eQTL analysis considering haplotypic and genotypic effects. Brief Bioinform 2021;22:6214641. [PMID: 33834181 DOI: 10.1093/bib/bbab093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/01/2021] [Accepted: 03/03/2021] [Indexed: 11/13/2022] Open

A SNaPshot Assay for Determination of the Mannose-Binding Lectin Gene Variants and an Algorithm for Calculation of Haplogenotype Combinations. Diagnostics (Basel) 2021;11:diagnostics11020301. [PMID: 33668563 PMCID: PMC7918147 DOI: 10.3390/diagnostics11020301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 02/07/2021] [Accepted: 02/11/2021] [Indexed: 11/16/2022] Open

Al Bkhetan Z, Chana G, Ramamohanarao K, Verspoor K, Goudey B. Evaluation of consensus strategies for haplotype phasing. Brief Bioinform 2020;22:5998997. [PMID: 33236761 DOI: 10.1093/bib/bbaa280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 09/22/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open

Lutgen D, Ritter R, Olsen R, Schielzeth H, Gruselius J, Ewels P, García JT, Shirihai H, Schweizer M, Suh A, Burri R. Linked‐read sequencing enables haplotype‐resolved resequencing at population scale. Mol Ecol Resour 2020;20:1311-1322. [DOI: 10.1111/1755-0998.13192] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/25/2020] [Accepted: 05/06/2020] [Indexed: 11/28/2022]