1
|
Temple SD, Thompson EA. Identity-by-descent segments in large samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.05.597656. [PMID: 38895476 PMCID: PMC11185678 DOI: 10.1101/2024.06.05.597656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
If two haplotypes share the same alleles for an extended gene tract, these haplotypes are likely to derive identical-by-descent from a recent common ancestor. Identity-by-descent segment lengths are correlated via unobserved tree and recombination processes, which commonly presents challenges to the derivation of theoretical results in population genetics. Under interpretable regularity conditions, we show that the proportion of detectable identity-by-descent segments at a locus is normally distributed for large sample size and large scaled population size. We use efficient and exact simulations to study the distributional behavior of the detectable identity-by-descent rate in finite samples. One consequence of non-normality in finite samples is that genome-wide scans based on identity-by-descent rates may be subject to anti-conservative Type 1 error control. Highlights We show the asymptotic normality of the identity-by-descent rate, a mean of correlated binary random variables that arises in population genetics studies.We describe an efficient algorithm capable of simulating long identity-by-descent segments around a locus in large sample sizes.In enormous simulation studies, we use this algorithm to characterize the distributional properties of the identity-by-descent rate.In finite samples, we reject the null hypothesis of normality more often than the nominal significance level, indicating that genome-wide scans based on identity-by-descent rates may be anti-conservative.
Collapse
|
2
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. Am J Hum Genet 2024; 111:691-700. [PMID: 38513668 PMCID: PMC11023918 DOI: 10.1016/j.ajhg.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
3
|
Berkowitz E, Falik Zaccai TC, Irge D, Gur I, Tiosano B, Kesler A. A genetic survey of patients with familial idiopathic intracranial hypertension residing in a Middle Eastern village: genetic association study. Eur J Med Res 2024; 29:194. [PMID: 38528581 DOI: 10.1186/s40001-024-01800-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 03/18/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND The aim of this study was to determine whether genetic variants are associated with idiopathic intracranial hypertension (IIH) in a unique village where many of the IIH patients have familial ties, a homogenous population and a high prevalence of consanguinity. Several autosomal recessive disorders are common in this village and its population is considered at a high risk for genetic disorders. METHODS The samples were genotyped by the Ilumina OmniExpress-24 Kit, and analyzed by the Eagle V2.4 and DASH software package to cluster haplotypes shared between our cohort. Subsequently, we searched for specific haplotypes that were significantly associated with the patient groups. RESULTS Fourteen patients and 30 controls were included. Samples from 22 female participants (11 patients and 11 controls) were evaluated for haplotype clustering and genome-wide association studies (GWAS). A total of 710,000 single nucleotide polymorphisms (SNPs) were evaluated. Candidate areas positively associated with IIH included genes located on chromosomes 16, 8 (including the CA5A and BANP genes, p < 0.01), and negatively associated with genes located on chromosomes 1 and 6 (including PBX1, LMX1A, ESR1 genes, p < 0.01). CONCLUSIONS We discovered new loci possibly associated with IIH by employing a GWAS technique to estimate the associations with haplotypes instead of specific SNPs. This method can in all probability be used in cases where there is a limited amount of samples but strong familial connections. Several loci were identified that might be strong candidates for follow-up studies in other well-phenotypes cohorts.
Collapse
Affiliation(s)
- Eran Berkowitz
- Department of Ophthalmology, Hillel Yaffe Medical Center, 1 Ha-Shalom Street, 38100, Hadera, Israel.
- The Adelson School of Medicine, Ariel University, Ariel, Israel.
- The Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel.
| | | | - Dana Irge
- Genetic Institue, Meir Medical center, Kfar Saba, Israel
| | - Inbar Gur
- Department of Ophthalmology, Hillel Yaffe Medical Center, 1 Ha-Shalom Street, 38100, Hadera, Israel
| | - Beatrice Tiosano
- Department of Ophthalmology, Hillel Yaffe Medical Center, 1 Ha-Shalom Street, 38100, Hadera, Israel
| | - Anat Kesler
- Department of Ophthalmology, Hillel Yaffe Medical Center, 1 Ha-Shalom Street, 38100, Hadera, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- The Adelson School of Medicine, Ariel University, Ariel, Israel
| |
Collapse
|
4
|
Ji Q, Yao Y, Li Z, Zhou Z, Qian J, Tang Q, Xie J. Characterizing identity by descent segments in Chinese interpopulation unrelated individual pairs. Mol Genet Genomics 2024; 299:37. [PMID: 38494535 DOI: 10.1007/s00438-024-02132-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 02/22/2024] [Indexed: 03/19/2024]
Abstract
Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.
Collapse
Affiliation(s)
- Qiqi Ji
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhimin Li
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Jinglei Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China.
| |
Collapse
|
5
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
6
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565574. [PMID: 37961601 PMCID: PMC10635131 DOI: 10.1101/2023.11.03.565574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.
Collapse
Affiliation(s)
| | - Brian L. Browning
- Department of Biostatistics, University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| |
Collapse
|
7
|
Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, Gaynor SM, Joseph T, Zou Y, Liu D, Wade R, Staples J, Panea R, Popov A, Bai X, Balasubramanian S, Habegger L, Lanche R, Lopez A, Maxwell E, Jones M, García-Ortiz H, Ramirez-Reyes R, Santacruz-Benítez R, Nag A, Smith KR, Damask A, Lin N, Paulding C, Reppell M, Zöllner S, Jorgenson E, Salerno W, Petrovski S, Overton J, Reid J, Thornton TA, Abecasis G, Berumen J, Orozco-Orozco L, Collins R, Baras A, Hill MR, Emberson JR, Marchini J, Kuri-Morales P, Tapia-Conyer R. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 2023; 622:784-793. [PMID: 37821707 PMCID: PMC10600010 DOI: 10.1038/s41586-023-06595-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/31/2023] [Indexed: 10/13/2023]
Abstract
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Collapse
Affiliation(s)
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| | - Jesús Alegre-Díaz
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | | | - Michael Turner
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Oxford Kidney Unit, Churchill Hospital, Oxford, UK
| | | | | | - Yuxin Zou
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Daren Liu
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Rachel Wade
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | - Alex Popov
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | - Alex Lopez
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Raul Ramirez-Reyes
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Rogelio Santacruz-Benítez
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Abhishek Nag
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Katherine R Smith
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Amy Damask
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Nan Lin
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | | | | | | | | | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Michael R Hill
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan R Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | - Pablo Kuri-Morales
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico.
| |
Collapse
|
8
|
Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet 2023; 55:768-776. [PMID: 37127670 PMCID: PMC10181934 DOI: 10.1038/s41588-023-01379-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
Collapse
Affiliation(s)
- Brian C Zhang
- Department of Statistics, University of Oxford, Oxford, UK
| | - Arjun Biddanda
- Department of Statistics, University of Oxford, Oxford, UK
| | - Árni Freyr Gunnarsson
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
9
|
Shemirani R, Belbin GM, Burghardt K, Lerman K, Avery CL, Kenny EE, Gignoux CR, Ambite JL. Selecting Clustering Algorithms for Identity-By-Descent Mapping. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:121-132. [PMID: 36540970 PMCID: PMC9782725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.
Collapse
Affiliation(s)
- Ruhollah Shemirani
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA,
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Tang K, Naseri A, Wei Y, Zhang S, Zhi D. Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts. Gigascience 2022; 11:giac111. [PMID: 36472573 PMCID: PMC9724555 DOI: 10.1093/gigascience/giac111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/04/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
In the recent biobank era of genetics, the problem of identical-by-descent (IBD) segment detection received renewed interest, as IBD segments in large cohorts offer unprecedented opportunities in the study of population and genealogical history, as well as genetic association of long haplotypes. While a new generation of efficient methods for IBD segment detection becomes available, direct comparison of these methods is difficult: existing benchmarks were often evaluated in different datasets, with some not openly accessible; methods benchmarked were run under suboptimal parameters; and benchmark performance metrics were not defined consistently. Here, we developed a comprehensive and completely open-source evaluation of the power, accuracy, and resource consumption of these IBD segment detection methods using realistic population genetic simulations with various settings. Our results pave the road for fair evaluation of IBD segment detection methods and provide an practical guide for users.
Collapse
Affiliation(s)
- Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
11
|
Valikhova LV, Kharkov VN, Zarubin AA, Kolesnikov NA, Svarovskaya MG, Khitrinskaya IY, Shtygasheva OV, Volkov VG, Stepanov VA. Genetic Interrelation of the Chulym Turks with Khakass and Kets according to Autosomal SNP Data and Y-Chromosome Haplogroups. RUSS J GENET+ 2022. [DOI: 10.1134/s1022795422100118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
12
|
Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank. Proc Natl Acad Sci U S A 2022; 119:e2119281119. [PMID: 35696575 PMCID: PMC9233301 DOI: 10.1073/pnas.2119281119] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Haplotype-based analyses have recently been leveraged to interrogate the fine-scale structure in specific geographic regions, notably in Europe, although an equivalent haplotype-based understanding across the whole of Europe with these tools is lacking. Furthermore, study of identity-by-descent (IBD) sharing in a large sample of haplotypes across Europe would allow a direct comparison between different demographic histories of different regions. The UK Biobank (UKBB) is a population-scale dataset of genotype and phenotype data collected from the United Kingdom, with established sampling of worldwide ancestries. The exact content of these non-UK ancestries is largely uncharacterized, where study could highlight valuable intracontinental ancestry references with deep phenotyping within the UKBB. In this context, we sought to investigate the sample of European ancestry captured in the UKBB. We studied the haplotypes of 5,500 UKBB individuals with a European birthplace; investigated the population structure and demographic history in Europe, showing in parallel the variety of footprints of demographic history in different genetic regions around Europe; and expand knowledge of the genetic landscape of the east and southeast of Europe. Providing an updated map of European genetics, we leverage IBD-segment sharing to explore the extent of population isolation and size across the continent. In addition to building and expanding upon previous knowledge in Europe, our results show the UKBB as a source of diverse ancestries beyond Britain. These worldwide ancestries sampled in the UKBB may complement and inform researchers interested in specific communities or regions not limited to Britain.
Collapse
|
13
|
Yue W, Naseri A, Wang V, Shakya P, Zhang S, Zhi D. P-smoother: efficient PBWT smoothing of large haplotype panels. BIOINFORMATICS ADVANCES 2022; 2:vbac045. [PMID: 35785021 PMCID: PMC9245627 DOI: 10.1093/bioadv/vbac045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/03/2022] [Accepted: 06/15/2022] [Indexed: 01/27/2023]
Abstract
Motivation As large haplotype panels become increasingly available, efficient string matching algorithms such as positional Burrows-Wheeler transformation (PBWT) are promising for identifying shared haplotypes. However, recent mutations and genotyping errors create occasional mismatches, presenting challenges for exact haplotype matching. Previous solutions are based on probabilistic models or seed-and-extension algorithms that passively tolerate mismatches. Results Here, we propose a PBWT-based smoothing algorithm, P-smoother, to actively 'correct' these mismatches and thus 'smooth' the panel. P-smoother runs a bidirectional PBWT-based panel scanning that flips mismatching alleles based on the overall haplotype matching context, which we call the IBD (identical-by-descent) prior. In a simulated panel with 4000 haplotypes and a 0.2% error rate, we show it can reliably correct 85% of errors. As a result, PBWT algorithms running over the smoothed panel can identify more pairwise IBD segments than that over the unsmoothed panel. Most strikingly, a PBWT-cluster algorithm running over the smoothed panel, which we call PS-cluster, achieves state-of-the-art performance for identifying multiway IBD segments, a challenging problem in the computational community for years. We also showed that PS-cluster is adequately efficient for UK Biobank data. Therefore, P-smoother opens up new possibilities for efficient error-tolerating algorithms for biobank-scale haplotype panels. Availability and implementation Source code is available at github.com/ZhiGroup/P-smoother.
Collapse
Affiliation(s)
- William Yue
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ardalan Naseri
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Victor Wang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Pramesh Shakya
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
14
|
Belbin GM, Rutledge S, Dodatko T, Cullina S, Turchin MC, Kohli S, Torre D, Yee MC, Gignoux CR, Abul-Husn NS, Houten SM, Kenny EE. Leveraging health systems data to characterize a large effect variant conferring risk for liver disease in Puerto Ricans. Am J Hum Genet 2021; 108:2099-2111. [PMID: 34678161 PMCID: PMC8595966 DOI: 10.1016/j.ajhg.2021.09.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 09/28/2021] [Indexed: 12/22/2022] Open
Abstract
The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.
Collapse
Affiliation(s)
- Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Stephanie Rutledge
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Tetyana Dodatko
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sinead Cullina
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael C Turchin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sumita Kohli
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Denis Torre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Muh-Ching Yee
- Stanford Functional Genomics Facility, Stanford University, Stanford, CA 94305, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Noura S Abul-Husn
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sander M Houten
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
15
|
Sticca EL, Belbin GM, Gignoux CR. Current Developments in Detection of Identity-by-Descent Methods and Applications. Front Genet 2021; 12:722602. [PMID: 34567074 PMCID: PMC8461052 DOI: 10.3389/fgene.2021.722602] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/24/2021] [Indexed: 01/23/2023] Open
Abstract
Identity-by-descent (IBD), the detection of shared segments inherited from a common ancestor, is a fundamental concept in genomics with broad applications in the characterization and analysis of genomes. While historically the concept of IBD was extensively utilized through linkage analyses and in studies of founder populations, applications of IBD-based methods subsided during the genome-wide association study era. This was primarily due to the computational expense of IBD detection, which becomes increasingly relevant as the field moves toward the analysis of biobank-scale datasets that encompass individuals from highly diverse backgrounds. To address these computational barriers, the past several years have seen new methodological advances enabling IBD detection for datasets in the hundreds of thousands to millions of individuals, enabling novel analyses at an unprecedented scale. Here, we describe the latest innovations in IBD detection and describe opportunities for the application of IBD-based methods across a broad range of questions in the field of genomics.
Collapse
Affiliation(s)
- Evan L Sticca
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Gillian M Belbin
- Institute for Genomic Health, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
16
|
Warshauer EM, Brown A, Fuentes I, Shortt J, Gignoux C, Montinaro F, Metspalu M, Youssefian L, Vahidnezhad H, Jacków J, Christiano AM, Uitto J, Fajardo-Ramírez ÓR, Salas-Alanis JC, McGrath JA, Consuegra L, Rivera C, Maier PA, Runfeldt G, Behar DM, Skorecki K, Sprecher E, Palisson F, Norris DA, Bruckner AL, Kogut I, Bilousova G, Roop DR. Ancestral patterns of recessive dystrophic epidermolysis bullosa mutations in Hispanic populations suggest sephardic ancestry. Am J Med Genet A 2021; 185:3390-3400. [PMID: 34435747 DOI: 10.1002/ajmg.a.62456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/11/2021] [Accepted: 07/14/2021] [Indexed: 11/11/2022]
Abstract
Recessive dystrophic epidermolysis bullosa (RDEB) is a rare genodermatosis caused by mutations in the gene coding for type VII collagen (COL7A1). More than 800 different pathogenic mutations in COL7A1 have been described to date; however, the ancestral origins of many of these mutations have not been precisely identified. In this study, 32 RDEB patient samples from the Southwestern United States, Mexico, Chile, and Colombia carrying common mutations in the COL7A1 gene were investigated to determine the origins of these mutations and the extent to which shared ancestry contributes to disease prevalence. The results demonstrate both shared European and American origins of RDEB mutations in distinct populations in the Americas and suggest the influence of Sephardic ancestry in at least some RDEB mutations of European origins. Knowledge of ancestry and relatedness among RDEB patient populations will be crucial for the development of future clinical trials and the advancement of novel therapeutics.
Collapse
Affiliation(s)
- Emily Mira Warshauer
- Department of Dermatology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA.,Charles C. Gates Center for Regenerative Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| | - Adam Brown
- Avotaynu Research Partnership LLC, Englewood, New Jersey, USA
| | - Ignacia Fuentes
- Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, Universidad del Desarrollo, Santiago, Chile.,Fundación DEBRA Chile, Santiago, Chile
| | - Jonathan Shortt
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Chris Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Francesco Montinaro
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia.,Department of Biology and Genetics, University of Bari, Bari, Italy
| | - Mait Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Leila Youssefian
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College and Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
| | - Hassan Vahidnezhad
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College and Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
| | - Joanna Jacków
- Department of Dermatology, Columbia University, New York, New York, USA.,St. John's Institute of Dermatology, King's College London (Guy's Campus), London, UK
| | - Angela M Christiano
- Department of Dermatology, Columbia University, New York, New York, USA.,Department of Genetics and Development, Columbia University, New York, New York, USA
| | - Jouni Uitto
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College and Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
| | - Óscar R Fajardo-Ramírez
- DEBRA Mexico, Azteca Guadalupe, Mexico.,Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Mexico
| | - Julio C Salas-Alanis
- DEBRA Mexico, Azteca Guadalupe, Mexico.,Instituto Dermatologico de Jalisco, Zapopan, Mexico
| | - John A McGrath
- St. John's Institute of Dermatology, King's College London (Guy's Campus), London, UK
| | | | - Carolina Rivera
- Fundación DEBRA Colombia, Bogotá, Colombia.,Department of Medical Genetics, Pediatric Hospital, Fundacion Cardioinfantil-Universidad del Rosario, Bogotá, Colombia
| | - Paul A Maier
- Gene by Gene, Genomic Research Center, Houston, Texas, USA
| | - Goran Runfeldt
- Gene by Gene, Genomic Research Center, Houston, Texas, USA
| | - Doron M Behar
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia.,Gene by Gene, Genomic Research Center, Houston, Texas, USA
| | - Karl Skorecki
- Azrieli Faculty of Medicine of the Galilee, Bar-Ilan University, Safed, Israel
| | - Eli Sprecher
- Department of Dermatology, Tel-Aviv Sourasky Medical Center, Tel Aviv, Israel.,Department of Human Molecular Genetics, Sackler Faculty of Medicine, Tel-Aviv University, Tel Aviv, Israel
| | - Francis Palisson
- Fundación DEBRA Chile, Santiago, Chile.,Facultad de Medicina Clínica Alemana Universidad del Desarrollo, Santiago, Chile
| | - David A Norris
- Department of Dermatology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA.,Charles C. Gates Center for Regenerative Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| | - Anna L Bruckner
- Department of Dermatology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| | - Igor Kogut
- Department of Dermatology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA.,Charles C. Gates Center for Regenerative Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| | - Ganna Bilousova
- Department of Dermatology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA.,Charles C. Gates Center for Regenerative Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| | - Dennis R Roop
- Department of Dermatology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA.,Charles C. Gates Center for Regenerative Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| |
Collapse
|
17
|
Matallana-Ramirez LP, Whetten RW, Sanchez GM, Payn KG. Breeding for Climate Change Resilience: A Case Study of Loblolly Pine ( Pinus taeda L.) in North America. FRONTIERS IN PLANT SCIENCE 2021; 12:606908. [PMID: 33995428 PMCID: PMC8119900 DOI: 10.3389/fpls.2021.606908] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 04/08/2021] [Indexed: 05/25/2023]
Abstract
Earth's atmosphere is warming and the effects of climate change are becoming evident. A key observation is that both the average levels and the variability of temperature and precipitation are changing. Information and data from new technologies are developing in parallel to provide multidisciplinary opportunities to address and overcome the consequences of these changes in forest ecosystems. Changes in temperature and water availability impose multidimensional environmental constraints that trigger changes from the molecular to the forest stand level. These can represent a threat for the normal development of the tree from early seedling recruitment to adulthood both through direct mortality, and by increasing susceptibility to pathogens, insect attack, and fire damage. This review summarizes the strengths and shortcomings of previous work in the areas of genetic variation related to cold and drought stress in forest species with particular emphasis on loblolly pine (Pinus taeda L.), the most-planted tree species in North America. We describe and discuss the implementation of management and breeding strategies to increase resilience and adaptation, and discuss how new technologies in the areas of engineering and genomics are shaping the future of phenotype-genotype studies. Lessons learned from the study of species important in intensively-managed forest ecosystems may also prove to be of value in helping less-intensively managed forest ecosystems adapt to climate change, thereby increasing the sustainability and resilience of forestlands for the future.
Collapse
Affiliation(s)
- Lilian P. Matallana-Ramirez
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, Raleigh, NC, United States
| | - Ross W. Whetten
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, Raleigh, NC, United States
| | - Georgina M. Sanchez
- Center for Geospatial Analytics, North Carolina State University, Raleigh, Raleigh, NC, United States
| | - Kitt G. Payn
- Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, Raleigh, NC, United States
| |
Collapse
|
18
|
Sapin E, Keller MC. Novel Approach for Parallelizing Pairwise Comparison Problems as Applied to Detecting Segments Identical By Decent in Whole-Genome Data. Bioinformatics 2021; 37:2121-2125. [PMID: 33705528 PMCID: PMC8352502 DOI: 10.1093/bioinformatics/btab084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 11/09/2020] [Accepted: 03/09/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons-either on pairs of SNPs or pairs of individuals-are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n2 comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization. RESULTS We demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science.
Collapse
Affiliation(s)
- Emmanuel Sapin
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| |
Collapse
|
19
|
Nait Saada J, Kalantzis G, Shyr D, Cooper F, Robinson M, Gusev A, Palamara PF. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat Commun 2020; 11:6130. [PMID: 33257650 PMCID: PMC7704644 DOI: 10.1038/s41467-020-19588-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/02/2020] [Indexed: 12/14/2022] Open
Abstract
Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample's birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.
Collapse
Affiliation(s)
| | | | - Derek Shyr
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Alexander Gusev
- Brigham & Women's Hospital, Division of Genetics, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
20
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
21
|
Zhu SJ, Hendry JA, Almagro-Garcia J, Pearson RD, Amato R, Miles A, Weiss DJ, Lucas TC, Nguyen M, Gething PW, Kwiatkowski D, McVean G. The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria. eLife 2019; 8:e40845. [PMID: 31298657 PMCID: PMC6684230 DOI: 10.7554/elife.40845] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 07/10/2019] [Indexed: 02/07/2023] Open
Abstract
Individual malaria infections can carry multiple strains of Plasmodium falciparum with varying levels of relatedness. Yet, how local epidemiology affects the properties of such mixed infections remains unclear. Here, we develop an enhanced method for strain deconvolution from genome sequencing data, which estimates the number of strains, their proportions, identity-by-descent (IBD) profiles and individual haplotypes. Applying it to the Pf3k data set, we find that the rate of mixed infection varies from 29% to 63% across countries and that 51% of mixed infections involve more than two strains. Furthermore, we estimate that 47% of symptomatic dual infections contain sibling strains likely to have been co-transmitted from a single mosquito, and find evidence of mixed infections propagated over successive infection cycles. Finally, leveraging data from the Malaria Atlas Project, we find that prevalence correlates within Africa, but not Asia, with both the rate of mixed infection and the level of IBD.
Collapse
Affiliation(s)
- Sha Joe Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Jason A Hendry
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Jacob Almagro-Garcia
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Richard D Pearson
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Roberto Amato
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Alistair Miles
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel J Weiss
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Tim Cd Lucas
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Michele Nguyen
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Peter W Gething
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Dominic Kwiatkowski
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
22
|
Reconstructing recent population history while mapping rare variants using haplotypes. Sci Rep 2019; 9:5849. [PMID: 30971755 PMCID: PMC6458133 DOI: 10.1038/s41598-019-42385-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 03/28/2019] [Indexed: 12/11/2022] Open
Abstract
Haplotype-based methods are a cost-effective alternative to characterize unobserved rare variants and map disease-associated alleles. Moreover, they can be used to reconstruct recent population history, which shaped distribution of rare variants and thus can be used to guide gene mapping studies. In this study, we analysed Illumina 650 k genotyped dataset on three underrepresented populations from Eastern Europe, where ancestors of Russians came into contact with two indigenous ethnic groups, Bashkirs and Tatars. Using the IBD mapping approach, we identified two rare IBD haplotypes strongly enriched in asthma patients of distinct ethnic background. We reconstructed recent population history using haplotype-based methods to reconcile this contradictory finding. Our ChromoPainter analysis showed that these haplotypes each descend from a single ancestor coming from one of the ethnic groups studied. Next, we used DoRIS approach and showed that source populations for patients exchanged recent (<60 generations) asymmetric gene flow, which supported the ChromoPainter-based scenario that patients share haplotypes through inter-ethnic admixture. Finally, we show that these IBD haplotypes overlap with asthma-associated genomic regions ascertained in European population. This finding is consistent with the fact that the two donor populations for the rare IBD haplotypes: Russians and Tatars have European ancestry.
Collapse
|
23
|
Szatkiewicz J, Crowley JJ, Adolfsson AN, Åberg KA, Alaerts M, Genovese G, McCarroll S, Del-Favero J, Adolfsson R, Sullivan PF. The genomics of major psychiatric disorders in a large pedigree from Northern Sweden. Transl Psychiatry 2019; 9:60. [PMID: 30718465 PMCID: PMC6362018 DOI: 10.1038/s41398-019-0414-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 01/15/2019] [Accepted: 01/17/2019] [Indexed: 11/08/2022] Open
Abstract
We searched for genetic causes of major psychiatric disorders (bipolar disorder, schizoaffective disorder, and schizophrenia) in a large, densely affected pedigree from Northern Sweden that originated with three pairs of founders born around 1650. We applied a systematic genomic approach to the pedigree via karyotyping (N = 9), genome-wide SNP arrays (N = 418), whole-exome sequencing (N = 26), and whole-genome sequencing (N = 10). Comprehensive analysis did not identify plausible variants of strong effect. Rather, pedigree cases had significantly higher genetic risk scores compared to pedigree and community controls.
Collapse
Affiliation(s)
- Jin Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - James J Crowley
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Karolina A Åberg
- Center for Biomarker Research and Precision Medicine, Virginia Commonwealth University, Richmond, VA, USA
| | - Maaike Alaerts
- Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium
| | | | | | - Jurgen Del-Favero
- VIB Center for Molecular Neurology, Universiteitsplein 1, Antwerp, Belgium and Multiplicom N.V., Galileilaan 18, Niel, Belgium
| | - Rolf Adolfsson
- Department of Clinical Sciences and Psychiatry, University of Umeå, Umeå, Sweden.
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
| |
Collapse
|
24
|
High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 2018; 50:1311-1317. [PMID: 30104759 PMCID: PMC6145075 DOI: 10.1038/s41588-018-0177-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 06/21/2018] [Indexed: 12/19/2022]
Abstract
Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequence data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified LD score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.
Collapse
|
25
|
Hsueh WC, Nair AK, Kobes S, Chen P, Göring HHH, Pollin TI, Malhotra A, Knowler WC, Baier LJ, Hanson RL. Identity-by-Descent Mapping Identifies Major Locus for Serum Triglycerides in Amerindians Largely Explained by an APOC3 Founder Mutation. ACTA ACUST UNITED AC 2018; 10:CIRCGENETICS.117.001809. [PMID: 29237685 DOI: 10.1161/circgenetics.117.001809] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Accepted: 10/03/2017] [Indexed: 12/14/2022]
Abstract
BACKGROUND Identity-by-descent mapping using empirical estimates of identity-by-descent allele sharing may be useful for studies of complex traits in founder populations, where hidden relationships may augment the inherent genetic information that can be used for localization. METHODS AND RESULTS Through identity-by-descent mapping, using ≈400 000 single-nucleotide polymorphisms (SNPs), of serum lipid profiles, we identified a major linkage signal for triglycerides in 1007 Pima Indians (LOD=9.23; P=3.5×10-11 on chromosome 11q). In subsequent fine-mapping and replication association studies in ≈7500 Amerindians, we determined that this signal reflects effects of a loss-of-function Ala43Thr substitution in APOC3 (rs147210663) and 3 established functional SNPs in APOA5. The association with rs147210663 was particularly strong; each copy of the Thr allele conferred 42% lower triglycerides (β=-0.92±0.059 SD unit; P=9.6×10-55 in 4668 Pimas and 2793 Southwest Amerindians combined). The Thr allele is extremely rare in most global populations but has a frequency of 2.5% in Pimas. We further demonstrated that 3 APOA5 SNPs with established functional impact could explain the association with the most well-replicated SNP (rs964184) for triglycerides identified by genome-wide association studies. Collectively, these 4 SNPs account for 6.9% of variation in triglycerides in Pimas (and 4.1% in Southwest Amerindians), and their inclusion in the original linkage model reduced the linkage signal to virtually null. CONCLUSIONS APOC3/APOA5 constitutes a major locus for serum triglycerides in Amerindians, especially the Pimas, and these results provide an empirical example for the concept that population-based linkage analysis is a useful strategy to identify complex trait variants.
Collapse
Affiliation(s)
- Wen-Chi Hsueh
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.).
| | - Anup K Nair
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Sayuko Kobes
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Peng Chen
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Harald H H Göring
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Toni I Pollin
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Alka Malhotra
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - William C Knowler
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Leslie J Baier
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| | - Robert L Hanson
- From the Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, AZ (W.-C.H., A.K.N., S.K., P.C., A.M., W.C.K., L.J.B., R.L.H.); South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, San Antonio (H.H.H.G.); Departments of Medicine and Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore (T.I.P.); and Illumina Inc, San Diego, CA (A.M.)
| |
Collapse
|
26
|
Belbin GM, Odgis J, Sorokin EP, Yee MC, Kohli S, Glicksberg BS, Gignoux CR, Wojcik GL, Van Vleck T, Jeff JM, Linderman M, Schurmann C, Ruderfer D, Cai X, Merkelson A, Justice AE, Young KL, Graff M, North KE, Peters U, James R, Hindorff L, Kornreich R, Edelmann L, Gottesman O, Stahl EE, Cho JH, Loos RJ, Bottinger EP, Nadkarni GN, Abul-Husn NS, Kenny EE. Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system. eLife 2017; 6:25060. [PMID: 28895531 PMCID: PMC5595434 DOI: 10.7554/elife.25060] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 08/09/2017] [Indexed: 11/16/2022] Open
Abstract
Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease. Diseases often run in families. These disease are frequently linked to changes in DNA that are passed down through generations. Close family members may share these disease-causing mutations; so may distant relatives who inherited the same mutation from a common ancestor long ago. Geneticists use a method called linkage mapping to trace a disease found in multiple members of a family over generations to genetic changes in a shared ancestor. This allows scientists to pinpoint the exact place in the genome the disease-causing mutation occurred. Using computer algorithms, scientists can apply the same technique to identify mutations that distant relatives inherited from a common ancestor. Belbin et al. used this computational technique to identify a mutation that may cause unusually short stature or bone and joint problems in up to 2% of people of Puerto Rican descent. In the experiments, the genomes of about 32,000 New Yorkers who have volunteered to participate in the BioMe Biobank and their health records were used to search for genetic changes linked to extremely short stature. The search revealed that people who inherited two copies of this mutation from their parents were likely to be extremely short or to have bone and joint problems. People who inherited one copy had an increased likelihood of joint or bone problems. This mutation affects a gene responsible for making a form of protein called collagen that is important for bone growth. The analysis suggests the mutation first arose in a Native American ancestor living in Puerto Rico around the time that European colonization began. The mutation had previously been linked to a disorder called Steel syndrome that was thought to be rare. Belbin et al. showed this condition is actually fairly common in people whose ancestors recently came from Puerto Rico, but may often go undiagnosed by their physicians. The experiments emphasize the importance of including diverse populations in genetic studies, as studies of people of predominantly European descent would likely have missed the link between this disease and mutation.
Collapse
Affiliation(s)
- Gillian Morven Belbin
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Jacqueline Odgis
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Elena P Sorokin
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Muh-Ching Yee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, United States
| | - Sumita Kohli
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Benjamin S Glicksberg
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Harris Center for Precision Wellness, Icahn School of Medicine at Mt Sinai, New York, United States
| | - Christopher R Gignoux
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Genevieve L Wojcik
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Tielman Van Vleck
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Janina M Jeff
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Michael Linderman
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Claudia Schurmann
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Douglas Ruderfer
- Broad Institute, Cambridge, United States.,Division of Psychiatric Genomics, Icahn School of Medicine at Mt Sinai, New York, United States.,Center for Statistical Genetics, Icahn School of Medicine at Mt Sinai, New York, United States
| | - Xiaoqiang Cai
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Amanda Merkelson
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Anne E Justice
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Kristin L Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Misa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Epidemiology, University of Washington School of Public Health, Seattle, United States
| | - Regina James
- National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, United States
| | - Lucia Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, United States
| | - Ruth Kornreich
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Lisa Edelmann
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Eli Ea Stahl
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Harris Center for Precision Wellness, Icahn School of Medicine at Mt Sinai, New York, United States.,Broad Institute, Cambridge, United States
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Ruth Jf Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Erwin P Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Noura S Abul-Husn
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Eimear E Kenny
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Center for Statistical Genetics, Icahn School of Medicine at Mt Sinai, New York, United States
| |
Collapse
|
27
|
Liu XQ, Fazio J, Hu P, Paterson AD. Identity-by-descent mapping for diastolic blood pressure in unrelated Mexican Americans. BMC Proc 2016; 10:263-267. [PMID: 27980647 PMCID: PMC5133517 DOI: 10.1186/s12919-016-0041-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Population-based identity by descent (IBD) mapping is a statistical method for detection of genetic loci that share an ancestral segment among “unrelated” pairs of individuals for a disease. As a complementary method to genome-wide association studies, IBD mapping is robust to allelic heterogeneity and may identify rare inherited variants when combined with sequence data. Our objective is to identify the causal genes for diastolic blood pressure (DBP). We applied a population-based IBD mapping method to 105 unrelated individuals selected from the family data provided for the Genetic Analysis Workshop 19. Using the genome-wide association study data (ie, the microarray data), chromosome 3 was scanned for IBD sharing segments among all pairs of these individuals. At the chromosomal region with the most significant relationship between IBD sharing and DBP, the whole genome sequence data were examined to identify the risk variants for DBP. The most significant chromosomal region that was identified to have a relationship between the IBD sharing and DBP was at 3q12.3 (p = 0.0016), although it did not achieve the chromosome-wide significance level (p = 0.00012). This chromosomal region contains 1 gene, ZPLD1, which has been reported to be associated with cerebral cavernous malformations, a disease with enlarged small blood vessels (capillaries) in the brain. Although 24 deleterious variants were identified at this region, no significant association was found between these variants and DBP (p = 0.40). We presented a mapping strategy which combined a population-based IBD mapping method with sequence data analyses. One gene was located at a chromosomal region identified by this method for DBP. However, further study with a large sample size is needed to assess this result.
Collapse
Affiliation(s)
- Xiao-Qing Liu
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; The Children's Hospital Research Institute of Manitoba, Winnipeg, MB R3E 3P4 Canada
| | - Jillian Fazio
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Manitoba, Winnipeg, MB R3E 3P4 Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; George and Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg, MB R3A 1R9 Canada
| | - Andrew D Paterson
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4 Canada ; Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5G 0A4 Canada
| |
Collapse
|
28
|
Genetic variations associated with six-white-point coat pigmentation in Diannan small-ear pigs. Sci Rep 2016; 6:27534. [PMID: 27270507 PMCID: PMC4897638 DOI: 10.1038/srep27534] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 05/18/2016] [Indexed: 11/08/2022] Open
Abstract
A common phenotypic difference among domestic animals is variation in coat color. Six-white-point is a pigmentation pattern observed in varying pig breeds, which seems to have evolved through several different mechanistic pathways. Herein, we re-sequenced whole genomes of 31 Diannan small-ear pigs from China and found that the six-white-point coat color in Diannan small-ear pigs is likely regulated by polygenic loci, rather than by the MC1R locus. Strong associations were observed at three loci (EDNRB, CNTLN, and PINK1), which explain about 20 percent of the total coat color variance in the Diannan small-ear pigs. We found a mutation that is highly differentiated between six-white-point and black Diannan small-ear pigs, which is located in a conserved noncoding sequence upstream of the EDNRB gene and is a putative binding site of the CEBPB protein. This study advances our understanding of coat color evolution in Diannan small-ear pigs and expands our traditional knowledge of coat color being a monogenic trait.
Collapse
|
29
|
Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution. G3-GENES GENOMES GENETICS 2016; 6:1287-96. [PMID: 26935417 PMCID: PMC4856080 DOI: 10.1534/g3.116.027581] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to share an IBD segment if that segment is inherited from a recent shared common ancestor without intervening recombination. Segments several cM long can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample, and there are currently efforts to detect shorter segments from sequencing. Here, we study a problem of identifiability: because existing approaches detect IBD based on contiguous segments of identity-by-state, inferred long segments of IBD may arise from the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that significant proportions of inferred segments 1–2 cM long are results of conflations of two or more shorter segments, each at least 0.2 cM or longer, under demographic scenarios typical for modern humans for all programs tested. The impact of such conflation is much smaller for longer (> 2 cM) segments. This biases the inferred IBD segment length distribution, and so can affect downstream inferences that depend on the assumption that each segment of IBD derives from a single common ancestor. As an example, we present and analyze an estimator of the de novo mutation rate using IBD segments, and demonstrate that unmodeled conflation leads to underestimates of the ages of the common ancestors on these segments, and hence a significant overestimate of the mutation rate. Understanding the conflation effect in detail will make its correction in future methods more tractable.
Collapse
|
30
|
Yang S, Carmi S, Pe'er I. Rapidly Registering Identity-by-Descent Across Ancestral Recombination Graphs. J Comput Biol 2016; 23:495-507. [PMID: 27104872 DOI: 10.1089/cmb.2016.0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The genomes of remotely related individuals occasionally contain long segments that are identical by descent (IBD). Sharing of IBD segments has many applications in population and medical genetics, and it is thus desirable to study their properties in simulations. However, no current method provides a direct, efficient means to extract IBD segments from simulated genealogies. Here, we introduce computationally efficient approaches to extract ground-truth IBD segments from a sequence of genealogies, or equivalently, an ancestral recombination graph. Specifically, we use a two-step scheme, where we first identify putative shared segments by comparing the common ancestors of all pairs of individuals at some distance apart. This reduces the search space considerably, and we then proceed by determining the true IBD status of the candidate segments. Under some assumptions and when allowing a limited resolution of segment lengths, our run-time complexity is reduced from O(n(3) log n) for the naïve algorithm to O(n log n), where n is the number of individuals in the sample.
Collapse
Affiliation(s)
- Shuo Yang
- 1 Department of Computer Science, Columbia University , New York, New York
| | - Shai Carmi
- 3 Braun School of Public Health, Faculty of Medicine, Hebrew University, Jerusalem, Israel
| | - Itsik Pe'er
- 1 Department of Computer Science, Columbia University , New York, New York.,2 Department of Systems Biology, Columbia University , New York, New York
| |
Collapse
|
31
|
Bodea CA, Middleton FA, Melhem NM, Klei L, Song Y, Tiobech J, Marumoto P, Yano V, Faraone SV, Roeder K, Myles-Worsley M, Devlin B, Byerley W. Analysis of Shared Haplotypes amongst Palauans Maps Loci for Psychotic Disorders to 4q28 and 5q23-q31. Complex Psychiatry 2016; 2:173-184. [DOI: 10.1159/000450726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/19/2016] [Indexed: 11/19/2022] Open
Abstract
To localize genetic variation affecting risk for psychotic disorders in the population of Palau, we genotyped DNA samples from 203 Palauan individuals diagnosed with psychotic disorders, broadly defined, and 125 control subjects using a genome-wide single nucleotide polymorphism array. Palau has unique features advantageous for this study: due to its population history, Palauans are substantially interrelated; affected individuals often, but not always, cluster in families; and we have essentially complete ascertainment of affected individuals. To localize risk variants to genomic regions, we evaluated long-shared haplotypes, ≥10 Mb, identifying clusters of affected individuals who share such haplotypes. This extensive sharing, typically identical by descent, was significantly greater in cases than population controls, even after controlling for relatedness. Several regions of the genome exhibited substantial excess of shared haplotypes for affected individuals, including 3p21, 3p12, 4q28, and 5q23-q31. Two of these regions, 4q28 and 5q23-q31, showed significant linkage by traditional LOD score analysis and could harbor variants of more sizeable risk for psychosis or a multiplicity of risk variants. The pattern of haplotype sharing in 4q28 highlights <i>PCDH10</i>, encoding a cadherin-related neuronal receptor, as possibly involved in risk.
Collapse
|
32
|
Ying D, Sham PC, Smith DK, Zhang L, Lau YL, Yang W. HaploShare: identification of extended haplotypes shared by cases and evaluation against controls. Genome Biol 2015; 16:92. [PMID: 25956955 PMCID: PMC4432975 DOI: 10.1186/s13059-015-0662-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 04/24/2015] [Indexed: 11/11/2022] Open
Abstract
Recent founder mutations may play important roles in complex diseases and Mendelian disorders. Detecting shared haplotypes that are identical by descent (IBD) could facilitate discovery of these mutations. Several programs address this, but are usually limited to detecting pair-wise shared haplotypes and not providing a comparison of cases and controls. We present a novel algorithm and software package, HaploShare, which detects extended haplotypes that are shared by multiple individuals, and allows comparisons between cases and controls. Testing on simulated and real cases demonstrated significant improvements in detection power and reduction of false positive rate by HaploShare relative to other programs.
Collapse
Affiliation(s)
- Dingge Ying
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong. .,Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| | - Pak Chung Sham
- Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong. .,Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| | - David Keith Smith
- State Key Laboratory for Emerging Infectious Diseases, The University of Hong Kong, Pokfulam, Hong Kong.
| | - Lu Zhang
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong.
| | - Yu Lung Lau
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong.
| | - Wanling Yang
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong. .,Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| |
Collapse
|
33
|
Pardo-Diaz C, Salazar C, Jiggins CD. Towards the identification of the loci of adaptive evolution. Methods Ecol Evol 2015; 6:445-464. [PMID: 25937885 PMCID: PMC4409029 DOI: 10.1111/2041-210x.12324] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 11/28/2014] [Indexed: 12/17/2022]
Abstract
1. Establishing the genetic and molecular basis underlying adaptive traits is one of the major goals of evolutionary geneticists in order to understand the connection between genotype and phenotype and elucidate the mechanisms of evolutionary change. Despite considerable effort to address this question, there remain relatively few systems in which the genes shaping adaptations have been identified. 2. Here, we review the experimental tools that have been applied to document the molecular basis underlying evolution in several natural systems, in order to highlight their benefits, limitations and suitability. In most cases, a combination of DNA, RNA and functional methodologies with field experiments will be needed to uncover the genes and mechanisms shaping adaptation in nature.
Collapse
Affiliation(s)
- Carolina Pardo-Diaz
- Biology Program, Faculty of Natural Sciences and Mathematics, Universidad del RosarioCarrera 24 No 63C-69, Bogotá 111221, Colombia
| | - Camilo Salazar
- Biology Program, Faculty of Natural Sciences and Mathematics, Universidad del RosarioCarrera 24 No 63C-69, Bogotá 111221, Colombia
| | - Chris D Jiggins
- Department of Zoology, University of CambridgeDowning Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
34
|
Park DS, Baran Y, Hormozdiari F, Eng C, Torgerson DG, Burchard EG, Zaitlen N. PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling. BMC Bioinformatics 2015; 16 Suppl 5:S9. [PMID: 25860540 PMCID: PMC4402697 DOI: 10.1186/1471-2105-16-s5-s9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has proven to be computationally difficult. To overcome this, many state of the art methods estimate the probability of IBD between each pair of haplotypes separately. While computationally efficient, these methods fail to leverage the clique structure of IBD resulting in less powerful IBD identification, especially for small IBD segments. We develop a hybrid approach (PIGS), which combines the computational efficiency of pairwise methods with the power of multiway methods. It leverages the IBD graph structure to compute the probability of IBD conditional on all pairwise estimates simultaneously. We show via extensive simulations and analysis of real data that our method produces a substantial increase in the number of identified small IBD segments.
Collapse
|
35
|
Mukherjee S, Guha S, Ikeda M, Iwata N, Malhotra AK, Pe'er I, Darvasi A, Lencz T. Excess of homozygosity in the major histocompatibility complex in schizophrenia. Hum Mol Genet 2014; 23:6088-95. [PMID: 24943592 PMCID: PMC4204767 DOI: 10.1093/hmg/ddu308] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Revised: 04/11/2014] [Accepted: 06/12/2014] [Indexed: 01/20/2023] Open
Abstract
Genome-wide association studies (GWAS) in schizophrenia have focused on additive allelic effects to identify disease risk loci. In order to examine potential recessive effects, we applied a novel approach to identify regions of excess homozygosity in an ethnically homogenous cohort: 904 schizophrenia cases and 1640 controls drawn from the Ashkenazi Jewish (AJ) population. Genome-wide examination of runs of homozygosity identified an excess in cases localized to the major histocompatibility complex (MHC). To refine this signal, we used the recently developed GERMLINE algorithm to identify chromosomal segments shared identical-by-descent (IBD) and compared homozygosity at such segments in cases and controls. We found a significant excess of homozygosity in schizophrenia cases compared with controls in the MHC (P-value = 0.003). An independent replication cohort of 548 schizophrenia cases from Japan and 542 matched healthy controls demonstrated similar effects. The strongest case-control recessive effects (P = 8.81 × 10(-8)) were localized to a 53-kb region near HLA-A, in a segment encompassing three poorly annotated genes, TRIM10, TRIM15 and TRIM40. At the same time, an adjacent segment in the Class I MHC demonstrated clear additive effects on schizophrenia risk, demonstrating the complexity of association in the MHC and the ability of our IBD approach to refine localization of broad signals derived from conventional GWAS. In sum, homozygosity in the classical MHC region appears to convey significant risk for schizophrenia, consistent with the ecological literature suggesting that homozygosity at the MHC locus may be associated with vulnerability to disease.
Collapse
Affiliation(s)
- Semanti Mukherjee
- The Zucker Hillside Hospital, Psychiatry Research, 75-59 263rd Street, Glen Oaks, NY 11004, USA, Feinstein Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA,
| | - Saurav Guha
- The Zucker Hillside Hospital, Psychiatry Research, 75-59 263rd Street, Glen Oaks, NY 11004, USA, Feinstein Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA
| | - Masashi Ikeda
- Fujita Health University School of Medicine, 1-98 Kutsukake-cho Dengakugakubo, Toyoake, Aichi 470-1192, Japan
| | - Nakao Iwata
- Fujita Health University School of Medicine, 1-98 Kutsukake-cho Dengakugakubo, Toyoake, Aichi 470-1192, Japan
| | - Anil K Malhotra
- The Zucker Hillside Hospital, Psychiatry Research, 75-59 263rd Street, Glen Oaks, NY 11004, USA, Feinstein Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA, Hofstra University School of Medicine, 500 Hofstra University, Hempstead, NY 11549, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY 10027, USA and
| | - Ariel Darvasi
- Department of Genetics, The Institute of Life Sciences, The Hebrew University of Jerusalem, Givat Ram, Jerusalem, Israel
| | - Todd Lencz
- The Zucker Hillside Hospital, Psychiatry Research, 75-59 263rd Street, Glen Oaks, NY 11004, USA, Feinstein Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA, Hofstra University School of Medicine, 500 Hofstra University, Hempstead, NY 11549, USA,
| |
Collapse
|
36
|
Carmi S, Wilton PR, Wakeley J, Pe'er I. A renewal theory approach to IBD sharing. Theor Popul Biol 2014; 97:35-48. [PMID: 25149691 DOI: 10.1016/j.tpb.2014.08.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Revised: 07/30/2014] [Accepted: 08/08/2014] [Indexed: 10/24/2022]
Abstract
A long genomic segment inherited by a pair of individuals from a single, recent common ancestor is said to be identical-by-descent (IBD). Shared IBD segments have numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC'), as well as introduce and justify a new model, which we term the renewal approximation, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC') for the mean number of shared segments longer than a cutoff and the fraction of the chromosome found in such segments. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also compute (again, in Laplace space) the distribution of the fraction of the chromosome in shared segments, from which we obtain explicit expressions for the first two moments. Finally, we generalize all results to populations with a variable effective size.
Collapse
Affiliation(s)
- Shai Carmi
- Department of Computer Science, Columbia University, New York, NY, 10027, USA.
| | - Peter R Wilton
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY, 10027, USA
| |
Collapse
|
37
|
Guo W, Shugart YY. The power comparison of the haplotype-based collapsing tests and the variant-based collapsing tests for detecting rare variants in pedigrees. BMC Genomics 2014; 15:632. [PMID: 25070353 PMCID: PMC4131059 DOI: 10.1186/1471-2164-15-632] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 07/18/2014] [Indexed: 11/20/2022] Open
Abstract
Background Both common and rare genetic variants have been shown to contribute to the etiology of complex diseases. Recent genome-wide association studies (GWAS) have successfully investigated how common variants contribute to the genetic factors associated with common human diseases. However, understanding the impact of rare variants, which are abundant in the human population (one in every 17 bases), remains challenging. A number of statistical tests have been developed to analyze collapsed rare variants identified by association tests. Here, we propose a haplotype-based approach. This work inspired by an existing statistical framework of the pedigree disequilibrium test (PDT), which uses genetic data to assess the effects of variants in general pedigrees. We aim to compare the performance between the haplotype-based approach and the rare variant-based approach for detecting rare causal variants in pedigrees. Results Extensive simulations in the sequencing setting were carried out to evaluate and compare the haplotype-based approach with the rare variant methods that drew on a more conventional collapsing strategy. As assessed through a variety of scenarios, the haplotype-based pedigree tests had enhanced statistical power compared with the rare variants based pedigree tests when the disease of interest was mainly caused by rare haplotypes (with multiple rare alleles), and vice versa when disease was caused by rare variants acting independently. For most of other situations when disease was caused both by haplotypes with multiple rare alleles and by rare variants with similar effects, these two approaches provided similar power in testing for association. Conclusions The haplotype-based approach was designed to assess the role of rare and potentially causal haplotypes. The proposed rare variants-based pedigree tests were designed to assess the role of rare and potentially causal variants. This study clearly documented the situations under which either method performs better than the other. All tests have been implemented in a software, which was submitted to the Comprehensive R Archive Network (CRAN) for general use as a computer program named rvHPDT.
Collapse
Affiliation(s)
| | - Yin Yao Shugart
- Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, 35 Convent Drive, Bethesda, MD 20892, USA.
| |
Collapse
|
38
|
Vacic V, Ozelius LJ, Clark LN, Bar-Shira A, Gana-Weisz M, Gurevich T, Gusev A, Kedmi M, Kenny EE, Liu X, Mejia-Santana H, Mirelman A, Raymond D, Saunders-Pullman R, Desnick RJ, Atzmon G, Burns ER, Ostrer H, Hakonarson H, Bergman A, Barzilai N, Darvasi A, Peter I, Guha S, Lencz T, Giladi N, Marder K, Pe'er I, Bressman SB, Orr-Urtreger A. Genome-wide mapping of IBD segments in an Ashkenazi PD cohort identifies associated haplotypes. Hum Mol Genet 2014; 23:4693-702. [PMID: 24842889 DOI: 10.1093/hmg/ddu158] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The recent series of large genome-wide association studies in European and Japanese cohorts established that Parkinson disease (PD) has a substantial genetic component. To further investigate the genetic landscape of PD, we performed a genome-wide scan in the largest to date Ashkenazi Jewish cohort of 1130 Parkinson patients and 2611 pooled controls. Motivated by the reduced disease allele heterogeneity and a high degree of identical-by-descent (IBD) haplotype sharing in this founder population, we conducted a haplotype association study based on mapping of shared IBD segments. We observed significant haplotype association signals at three previously implicated Parkinson loci: LRRK2 (OR = 12.05, P = 1.23 × 10(-56)), MAPT (OR = 0.62, P = 1.78 × 10(-11)) and GBA (multiple distinct haplotypes, OR > 8.28, P = 1.13 × 10(-11) and OR = 2.50, P = 1.22 × 10(-9)). In addition, we identified a novel association signal on chr2q14.3 coming from a rare haplotype (OR = 22.58, P = 1.21 × 10(-10)) and replicated it in a secondary cohort of 306 Ashkenazi PD cases and 2583 controls. Our results highlight the power of our haplotype association method, particularly useful in studies of founder populations, and reaffirm the benefits of studying complex diseases in Ashkenazi Jewish cohorts.
Collapse
Affiliation(s)
- Vladimir Vacic
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Laurie J Ozelius
- Department of Genetics and Genomic Sciences and Department of Neurology, Mount Sinai School of Medicine, New York, NY, USA
| | - Lorraine N Clark
- Department of Pathology and Cell Biology, Taub Institute for Research on Alzheimer's Disease and the Aging Brain
| | | | | | - Tanya Gurevich
- Department of Neurology, Movement Disorders Unit and Parkinson Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Alexander Gusev
- Department of Computer Science, Columbia University, New York, NY, USA
| | | | - Eimear E Kenny
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Xinmin Liu
- Department of Pathology and Cell Biology
| | | | - Anat Mirelman
- Department of Neurology, Movement Disorders Unit and Parkinson Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Deborah Raymond
- Mirken Department of Neurology, Beth Israel Medical Center, New York, NY, USA
| | - Rachel Saunders-Pullman
- Mirken Department of Neurology, Beth Israel Medical Center, New York, NY, USA The Saul R. Korey Department of Neurology
| | | | - Gil Atzmon
- Department of Medicine, Department of Genetics, Institute for Aging Research
| | | | - Harry Ostrer
- Department of Genetics, Department of Pathology, Department of Pediatrics
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Aviv Bergman
- Department of Systems and Computational Biology and
| | - Nir Barzilai
- Department of Medicine, Department of Genetics, Institute for Aging Research
| | - Ariel Darvasi
- Department of Genetics, Institute of Life Sciences, Hebrew University of Jerusalem, Givat Ram, Jerusalem, Israel
| | - Inga Peter
- Department of Genetics and Genomic Sciences and
| | - Saurav Guha
- Department of Genetics and Genomic Sciences and Department of Psychiatry, Division of Research, The Zucker Hillside Hospital Division of the North Shore-Long Island Jewish Health System, Glen Oaks, NY, USA
| | - Todd Lencz
- Department of Psychiatry, Division of Research, The Zucker Hillside Hospital Division of the North Shore-Long Island Jewish Health System, Glen Oaks, NY, USA Center for Psychiatric Neuroscience, The Feinstein Institute for Medical Research, Manhasset, NY, USA Department of Psychiatry and Behavioral Science, Albert Einstein College of Medicine, Bronx, NY, USA Department of Psychiatry and Department of Molecular Medicine, Hofstra University School of Medicine, Hempstead, NY, USA
| | - Nir Giladi
- Department of Neurology, Movement Disorders Unit and Parkinson Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Karen Marder
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology and Department of Psychiatry, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Susan B Bressman
- Mirken Department of Neurology, Beth Israel Medical Center, New York, NY, USA The Saul R. Korey Department of Neurology
| | - Avi Orr-Urtreger
- Genetic Institute and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
39
|
Zheng C, Kuhner MK, Thompson EA. Bayesian inference of local trees along chromosomes by the sequential Markov coalescent. J Mol Evol 2014; 78:279-92. [PMID: 24817610 PMCID: PMC4104301 DOI: 10.1007/s00239-014-9620-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Accepted: 04/18/2014] [Indexed: 11/30/2022]
Abstract
We propose a genealogy-sampling algorithm, Sequential Markov Ancestral Recombination Tree (SMARTree), that provides an approach to estimation from SNP haplotype data of the patterns of coancestry across a genome segment among a set of homologous chromosomes. To enable analysis across longer segments of genome, the sequence of coalescent trees is modeled via the modified sequential Markov coalescent (Marjoram and Wall, Genetics 7:16, 2006). To assess performance in estimating these local trees, our SMARTree implementation is tested on simulated data. Our base data set is of the SNPs in 10 DNA sequences over 50 kb. We examine the effects of longer sequences and of more sequences, and of a recombination and/or mutational hotspot. The model underlying SMARTree is an approximation to the full recombinant-coalescent distribution. However, in a small trial on simulated data, recovery of local trees was similar to that of LAMARC (Kuhner et al. Genetics 156:1393-1401, 2000a), a sampler which uses the full model.
Collapse
Affiliation(s)
- Chaozhi Zheng
- Department of Statistics, Box 354322, University of Washington, Seattle, WA 98115-4322, USA, Tel.: (206) 543-7237, Fax: (206) 685-7419
| | - Mary K. Kuhner
- Department of Genome Sciences, Box 355065, University of Washington, Seattle, WA 98115-5065, USA, Tel.: (206) 543-8751, Fax: (206) 685-7301
| | - Elizabeth A. Thompson
- Department of Statistics, Box 354322, University of Washington, Seattle, WA 98115-4322, USA, Tel.: (206) 685-0108, Fax: (206) 685-7419
| |
Collapse
|
40
|
Durand EY, Eriksson N, McLean CY. Reducing pervasive false-positive identical-by-descent segments detected by large-scale pedigree analysis. Mol Biol Evol 2014; 31:2212-22. [PMID: 24784137 PMCID: PMC4104314 DOI: 10.1093/molbev/msu151] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases, but IBD detection accuracy in nonsimulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. data set. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false-positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. Nearly all false positives arose from the allowance of haplotype switch errors when detecting IBD, a necessity for retrieving long (>6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that scores IBD segments proportional to the number of switch errors they contain. Applying HaploScore filtering to the IBD data at a precision of 0.8 produced a 13-fold increase in recall when compared with length-based filtering. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative data sources using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.
Collapse
|
41
|
O'Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, Traglia M, Huang J, Huffman JE, Rudan I, McQuillan R, Fraser RM, Campbell H, Polasek O, Asiki G, Ekoru K, Hayward C, Wright AF, Vitart V, Navarro P, Zagury JF, Wilson JF, Toniolo D, Gasparini P, Soranzo N, Sandhu MS, Marchini J. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet 2014; 10:e1004234. [PMID: 24743097 PMCID: PMC3990520 DOI: 10.1371/journal.pgen.1004234] [Citation(s) in RCA: 381] [Impact Index Per Article: 38.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 01/27/2014] [Indexed: 01/20/2023] Open
Abstract
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Collapse
Affiliation(s)
- Jared O'Connell
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Deepti Gurdasani
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Olivier Delaneau
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Nicola Pirastu
- Institute for Maternal and Child Health - IRCCS Burlo Garofolo, University of Trieste, Trieste, Italy
| | - Sheila Ulivi
- Institute for Maternal and Child Health - IRCCS Burlo Garofolo, Trieste, Italy
| | - Massimiliano Cocca
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Michela Traglia
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Jie Huang
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Jennifer E Huffman
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Igor Rudan
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Ruth McQuillan
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Ross M Fraser
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Harry Campbell
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Ozren Polasek
- Faculty of Medicine, University of Split, Split, Croatia
| | - Gershim Asiki
- Medical Research Council/Uganda Virus Research Institute (MRC/UVRI), Uganda Research Unit on AIDS, Entebbe, Uganda
| | - Kenneth Ekoru
- Laboratoire Génomique, Bioinformatique, et Applications (EA4627), Conservatoire National des Arts et Métiers, Paris, France
| | - Caroline Hayward
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Alan F Wright
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Veronique Vitart
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Pau Navarro
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Jean-Francois Zagury
- Laboratoire Génomique, Bioinformatique, et Applications (EA4627), Conservatoire National des Arts et Métiers, Paris, France
| | - James F Wilson
- Centre for Population Health Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Daniela Toniolo
- Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milano, Italy
| | - Paolo Gasparini
- Institute for Maternal and Child Health - IRCCS Burlo Garofolo, University of Trieste, Trieste, Italy
| | - Nicole Soranzo
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Manjinder S Sandhu
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Jonathan Marchini
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom; Department of Statistics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
42
|
Howey R, Cordell HJ. Imputation without doing imputation: a new method for the detection of non-genotyped causal variants. Genet Epidemiol 2014; 38:173-90. [PMID: 24535679 PMCID: PMC4150535 DOI: 10.1002/gepi.21792] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 12/30/2013] [Accepted: 12/31/2013] [Indexed: 01/22/2023]
Abstract
Genome-wide association studies allow detection of non-genotyped disease-causing variants through testing of nearby genotyped SNPs. This approach may fail when there are no genotyped SNPs in strong LD with the causal variant. Several genotyped SNPs in weak LD with the causal variant may, however, considered together, provide equivalent information. This observation motivates popular but computationally intensive approaches based on imputation or haplotyping. Here we present a new method and accompanying software designed for this scenario. Our approach proceeds by selecting, for each genotyped "anchor" SNP, a nearby genotyped "partner" SNP, chosen via a specific algorithm we have developed. These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test. In simulations, our method captures much of the signal captured by imputation, while taking a fraction of the time and disc space, and generating a smaller number of false-positives. We apply our method to a case/control study of severe malaria genotyped using the Affymetrix 500K array. Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels. Our method also increases the signal of association from P ≈ 2 × 10⁻⁶ to P ≈ 6 × 10⁻¹¹. Our method thus, in some cases, eliminates the need for more complex methods such as sequencing and imputation, and provides a useful additional test that may be used to identify genetic regions of interest.
Collapse
Affiliation(s)
- Richard Howey
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central ParkwayNewcastle upon Tyne, United Kingdom
| | - Heather J Cordell
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central ParkwayNewcastle upon Tyne, United Kingdom
| |
Collapse
|
43
|
Qian Y, Browning BL, Browning SR. Efficient clustering of identity-by-descent between multiple individuals. ACTA ACUST UNITED AC 2013; 30:915-22. [PMID: 24363374 DOI: 10.1093/bioinformatics/btt734] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Most existing identity-by-descent (IBD) detection methods only consider haplotype pairs; less attention has been paid to considering multiple haplotypes simultaneously, even though IBD is an equivalence relation on haplotypes that partitions a set of haplotypes into IBD clusters. Multiple-haplotype IBD clusters may have advantages over pairwise IBD in some applications, such as IBD mapping. Existing methods for detecting multiple-haplotype IBD clusters are often computationally expensive and unable to handle large samples with thousands of haplotypes. RESULTS We present a clustering method, efficient multiple-IBD, which uses pairwise IBD segments to infer multiple-haplotype IBD clusters. It expands clusters from seed haplotypes by adding qualified neighbors and extends clusters across sliding windows in the genome. Our method is an order of magnitude faster than existing methods and has comparable performance with respect to the quality of clusters it uncovers. We further investigate the potential application of multiple-haplotype IBD clusters in association studies by testing for association between multiple-haplotype IBD clusters and low-density lipoprotein cholesterol in the Northern Finland Birth Cohort. Using our multiple-haplotype IBD cluster approach, we found an association with a genomic interval covering the PCSK9 gene in these data that is missed by standard single-marker association tests. Previously published studies confirm association of PCSK9 with low-density lipoprotein. AVAILABILITY AND IMPLEMENTATION Source code is available under the GNU Public License http://cs.au.dk/~qianyuxx/EMI/.
Collapse
Affiliation(s)
- Yu Qian
- Bioinformatics Research Center, Aarhus Universitet, 8000C Aarhus, Denmark, Department of Biostatistics and Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | | | | |
Collapse
|
44
|
He D. IBD-Groupon: an efficient method for detecting group-wise identity-by-descent regions simultaneously in multiple individuals based on pairwise IBD relationships. Bioinformatics 2013; 29:i162-70. [PMID: 23812980 PMCID: PMC3694672 DOI: 10.1093/bioinformatics/btt237] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Detecting IBD tracts is an important problem in genetics. Most of the existing methods focus on detecting pairwise IBD tracts, which have relatively low power to detect short IBD tracts. Methods to detect IBD tracts among multiple individuals simultaneously, or group-wise IBD tracts, have better performance for short IBD tracts detection. Group-wise IBD tracts can be applied to a wide range of applications, such as disease mapping, pedigree reconstruction and so forth. The existing group-wise IBD tract detection method is computationally inefficient and is only able to handle small datasets, such as 20, 30 individuals with hundreds of SNPs. It also requires a previous specification of the number of IBD groups, or partitions of the individuals where all the individuals in the same partition are IBD with each other, which may not be realistic in many cases. The method can only handle a small number of IBD groups, such as two or three, because of scalability issues. What is more, it does not take LD (linkage disequilibrium) into consideration. RESULTS In this work, we developed an efficient method IBD-Groupon, which detects group-wise IBD tracts based on pairwise IBD relationships, and it is able to address all the drawbacks aforementioned. To our knowledge, our method is the first practical group-wise IBD tracts detection method that is scalable to very large datasets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile, it is powerful to detect short IBD tracts. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration, as it is based on pairwise IBD tracts where LD can be easily incorporated.
Collapse
Affiliation(s)
- Dan He
- Computational Genomics, IBM TJ Watson Research, Yorktown Heights, NY 10598, USA.
| |
Collapse
|
45
|
Abstract
Summary: Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent. Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and they can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and they were used to reconstruct recent demographic events in several populations. We here extend the framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formulation in several demographic scenarios, compare our approach with methods based on ancestry deconvolution and use this method to analyze Masai samples from the HapMap 3 dataset. Availability: DoRIS, a Java implementation of the proposed method, and its source code are freely available at http://www.cs.columbia.edu/∼pier/doris. Contact: itsik@cs.columbia.edu
Collapse
Affiliation(s)
- Pier Francesco Palamara
- Department of Computer Science, Columbia University, 500 West 120th, New York City, NY 10027, USA
| | | |
Collapse
|
46
|
Hochreiter S. HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data. Nucleic Acids Res 2013; 41:e202. [PMID: 24174545 PMCID: PMC3905877 DOI: 10.1093/nar/gkt1013] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD.
Collapse
Affiliation(s)
- Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University, Linz, Austria
| |
Collapse
|
47
|
Han B, Kang EY, Raychaudhuri S, de Bakker PIW, Eskin E. Fast pairwise IBD association testing in genome-wide association studies. ACTA ACUST UNITED AC 2013; 30:206-13. [PMID: 24158599 DOI: 10.1093/bioinformatics/btt609] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
MOTIVATION Recently, investigators have proposed state-of-the-art Identity-by-descent (IBD) mapping methods to detect IBD segments between purportedly unrelated individuals. The IBD information can then be used for association testing in genetic association studies. One approach for this IBD association testing strategy is to test for excessive IBD between pairs of cases ('pairwise method'). However, this approach is inefficient because it requires a large number of permutations. Moreover, a limited number of permutations define a lower bound for P-values, which makes fine-mapping of associated regions difficult because, in practice, a much larger genomic region is implicated than the region that is actually associated. RESULTS In this article, we introduce a new pairwise method 'Fast-Pairwise'. Fast-Pairwise uses importance sampling to improve efficiency and enable approximation of extremely small P-values. Fast-Pairwise method takes only days to complete a genome-wide scan. In the application to the WTCCC type 1 diabetes data, Fast-Pairwise successfully fine-maps a known human leukocyte antigen gene that is known to cause the disease. AVAILABILITY Fast-Pairwise is publicly available at: http://genetics.cs.ucla.edu/graphibd.
Collapse
Affiliation(s)
- Buhm Han
- Division of Genetics, Brigham and Women's Hospital and Division of Rheumatology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA, Partners Center for Personalized Genetic Medicine, Boston, MA 02115, USA, Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA, Computer Science Department, University of California, Los Angeles, CA 90095, USA, Faculty of Medical and Human Sciences, University of Manchester, Manchester, UK, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands, Department of Medical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands and Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
| | | | | | | | | |
Collapse
|
48
|
Gauvin H, Moreau C, Lefebvre JF, Laprise C, Vézina H, Labuda D, Roy-Gagnon MH. Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur J Hum Genet 2013; 22:814-21. [PMID: 24129432 PMCID: PMC4023206 DOI: 10.1038/ejhg.2013.227] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Revised: 08/07/2013] [Accepted: 09/04/2013] [Indexed: 12/16/2022] Open
Abstract
In genetics the ability to accurately describe the familial relationships among a group of individuals can be very useful. Recent statistical tools succeeded in assessing the degree of relatedness up to 6-7 generations with good power using dense genome-wide single-nucleotide polymorphism data to estimate the extent of identity-by-descent (IBD) sharing. It is therefore important to describe genome-wide patterns of IBD sharing for more remote and complex relatedness between individuals, such as that observed in a founder population like Quebec, Canada. Taking advantage of the extended genealogical records of the French Canadian founder population, we first compared different tools to identify regions of IBD in order to best describe genome-wide IBD sharing and its correlation with genealogical characteristics. Results showed that the extent of IBD sharing identified with FastIBD correlates best with relatedness measured using genealogical data. Total length of IBD sharing explained 85% of the genealogical kinship's variance. In addition, we observed significantly higher sharing in pairs of individuals with at least one inbred ancestor compared with those without any. Furthermore, patterns of IBD sharing and average sharing were different across regional populations, consistent with the settlement history of Quebec. Our results suggest that, as expected, the complex relatedness present in founder populations is reflected in patterns of IBD sharing. Using these patterns, it is thus possible to gain insight on the types of distant relationships in a sample from a founder population like Quebec.
Collapse
Affiliation(s)
- Héloïse Gauvin
- 1] Département de médecine sociale et préventive, Université de Montréal, Montréal, Québec, Canada [2] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Claudia Moreau
- Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Jean-François Lefebvre
- Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada
| | - Catherine Laprise
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| | - Hélène Vézina
- Département des sciences humaines, Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada
| | - Damian Labuda
- 1] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada [2] Département de pédiatrie, Université de Montréal, Montréal, Québec, Canada
| | - Marie-Hélène Roy-Gagnon
- 1] Centre de recherche, Centre hospitalier universitaire Sainte-Justine, Université de Montréal, Montréal, Québec, Canada [2] Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
49
|
Glodzik D, Navarro P, Vitart V, Hayward C, McQuillan R, Wild SH, Dunlop MG, Rudan I, Campbell H, Haley C, Wright AF, Wilson JF, McKeigue P. Inference of identity by descent in population isolates and optimal sequencing studies. Eur J Hum Genet 2013; 21:1140-5. [PMID: 23361219 PMCID: PMC3778345 DOI: 10.1038/ejhg.2012.307] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 12/18/2012] [Accepted: 12/28/2012] [Indexed: 01/24/2023] Open
Abstract
In an isolated population, individuals are likely to share large genetic regions inherited from common ancestors. Identity by descent (IBD) can be inferred from SNP genotypes, which is useful in a number of applications, including identifying genetic variants influencing complex disease risk, and planning efficient cohort-sequencing strategies. We present ANCHAP--a method for detecting IBD in isolated populations. We compare accuracy of the method against other long-range and local phasing methods, using parent-offspring trios. In our experiments, we show that ANCHAP performs similarly as the other long-range method, but requires an order-of-magnitude less computational resources. A local phasing model is able to achieve similar sensitivity, but only at the cost of higher false discovery rates. In some regions of the genome, the studied individuals share haplotypes particularly often, which hints at the history of the populations studied. We demonstrate the method using SNP genotypes from three isolated island populations, as well as in a cohort of unrelated individuals. In samples from three isolated populations of around 1000 individual each, an average individual shares a haplotype at a genetic locus with 9-12 other individuals, compared with only 1 individual within the non-isolated population. We describe an application of ANCHAP to optimally choose samples in resequencing studies. We find that with sample sizes of 1000 individuals from an isolated population genotyped using a dense SNP array, and with 20% of these individuals sequenced, 65% of sequences of the unsequenced subjects can be partially inferred.
Collapse
Affiliation(s)
- Dominik Glodzik
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Pau Navarro
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Veronique Vitart
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Caroline Hayward
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Ruth McQuillan
- College of Medicine and Veterinary Medicine, Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Sarah H Wild
- College of Medicine and Veterinary Medicine, Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Malcolm G Dunlop
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Igor Rudan
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Harry Campbell
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Chris Haley
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Alan F Wright
- MRC Institute of Genetics and Molecular Medicine (MRC IGMM), MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - James F Wilson
- College of Medicine and Veterinary Medicine, Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| | - Paul McKeigue
- College of Medicine and Veterinary Medicine, Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
50
|
Abstract
This study addresses the question of how purifying selection operates during recent rapid population growth such as has been experienced by human populations. This is not a straightforward problem because the human population is not at equilibrium: population genetics predicts that, on the one hand, the efficacy of natural selection increases as population size increases, eliminating ever more weakly deleterious variants; on the other hand, a larger number of deleterious mutations will be introduced into the population and will be more likely to increase in their number of copies as the population grows. To understand how patterns of human genetic variation have been shaped by the interaction of natural selection and population growth, we examined the trajectories of mutations with varying selection coefficients, using computer simulations. We observed that while population growth dramatically increases the number of deleterious segregating sites in the population, it only mildly increases the number carried by each individual. Our simulations also show an increased efficacy of natural selection, reflected in a higher fraction of deleterious mutations eliminated at each generation and a more efficient elimination of the most deleterious ones. As a consequence, while each individual carries a larger number of deleterious alleles than expected in the absence of growth, the average selection coefficient of each segregating allele is less deleterious. Combined, our results suggest that the genetic risk of complex diseases in growing populations might be distributed across a larger number of more weakly deleterious rare variants.
Collapse
|