1
|
Temple SD, Thompson EA. Identity-by-descent segments in large samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.06.05.597656. [PMID: 38895476 PMCID: PMC11185678 DOI: 10.1101/2024.06.05.597656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
If two haplotypes share the same alleles for an extended gene tract, these haplotypes are likely to be derived identical-by-descent from a recent common ancestor. Identity-by-descent segment lengths are correlated via unobserved ancestral tree and recombination processes, which commonly presents challenges to the derivation of theoretical results in population genetics. We show that the proportion of detectable identity-by-descent segments around a locus is normally distributed when the sample size and the scaled population size are large. We generalize this central limit theorem to cover flexible demographic scenarios, multi-way identity-by-descent segments, and multivariate identity-by-descent rates. We use efficient simulations to study the distributional behavior of the detectable identity-by-descent rate. One consequence of non-normality in finite samples is that a genome-wide scan looking for excess identity-by-descent rates may be subject to anti-conservative control of family-wise error rates.
Collapse
Affiliation(s)
- Seth D. Temple
- Department of Statistics, University of Washington, Seattle, Washington, USA
- Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, USA
| | | |
Collapse
|
2
|
Temple SD, Browning SR, Thompson EA. Fast simulation of identity-by-descent segments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.13.628449. [PMID: 39829821 PMCID: PMC11741331 DOI: 10.1101/2024.12.13.628449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than ten thousand diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.
Collapse
Affiliation(s)
- Seth D. Temple
- Department of Statistics, University of Washington, Seattle, WA, USA
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
- Michigan Institute of Data Science, University of Michigan, Ann Arbor, MI, USA
| | | | | |
Collapse
|
3
|
Paye SM, Edge MD. Mathematical bounds on r 2 and the effect size in case-control genome-wide association studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.17.628943. [PMID: 39764044 PMCID: PMC11702690 DOI: 10.1101/2024.12.17.628943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Case-control genome-wide association studies (GWAS) are often used to find associations between genetic variants and diseases. When case-control GWAS are conducted, researchers must make decisions regarding how many cases and how many controls to include in the study. Depending on differing availability and cost of controls and cases, varying case fractions are used in case-control GWAS. Connections between variants and diseases are made using association statistics, includingχ 2 . Previous work in population genetics has shown that LD statistics, includingr 2 , are bounded by the allele frequencies in the population being studied. Since varying the case fraction changes sample allele frequencies, we extend use the known bounds onr 2 to explore how variation in the fraction of cases included in a study can impact statistical power to detect associations. We analyze a simple mathematical model and use simulations to study a quantity proportional to theχ 2 noncentrality parameter, which is closely related tor 2 , under various conditions. Varying the case fraction changes theχ 2 noncentrality parameter, and by extension the statistical power, with effects depending on the dominance, penetrance, and frequency of the risk allele. Our framework explains previously observed results, such as asymmetries in power to detect risk vs. protective alleles, and the fact that a balanced sample of cases and controls does not always give the best power to detect associations, particularly for highly penetrant minor risk alleles that are either dominant or recessive. We show by simulation that our results can be used as a rough guide to statistical power for association tests other thanχ 2 tests of independence.
Collapse
Affiliation(s)
- Sanjana M Paye
- Department of Quantitative and Computational Biology, University of Southern California
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
4
|
Temple SD, Waples RK, Browning SR. Modeling recent positive selection using identity-by-descent segments. Am J Hum Genet 2024; 111:2510-2529. [PMID: 39362217 PMCID: PMC11568764 DOI: 10.1016/j.ajhg.2024.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 08/29/2024] [Accepted: 08/30/2024] [Indexed: 10/05/2024] Open
Abstract
Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s≥0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.
Collapse
Affiliation(s)
- Seth D Temple
- Department of Statistics, University of Washington, Seattle, WA, USA.
| | - Ryan K Waples
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
5
|
Lancaster MC, Chen HH, Shoemaker MB, Fleming MR, Strickland TL, Baker JT, Evans GF, Polikowsky HG, Samuels DC, Huff CD, Roden DM, Below JE. Detection of distant relatedness in biobanks to identify undiagnosed cases of Mendelian disease as applied to Long QT syndrome. Nat Commun 2024; 15:7507. [PMID: 39209900 PMCID: PMC11362435 DOI: 10.1038/s41467-024-51977-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
Rare genetic diseases are typically studied in referral populations, resulting in underdiagnosis and biased assessment of penetrance and phenotype. To address this, we develop a generalizable method of genotype inference based on distant relatedness and deploy this to identify undiagnosed Type 5 Long QT Syndrome (LQT5) rare variant carriers in a non-referral population. We identify 9 LQT5 families referred to a single specialty clinic, each carrying p.Asp76Asn, the most common LQT5 variant. We uncover recent common ancestry and a single shared haplotype among probands. Application to a non-referral population of 69,819 BioVU biobank subjects identifies 22 additional subjects sharing this haplotype, which we confirm to carry p.Asp76Asn. Referral and non-referral carriers have prolonged QT interval corrected for heart rate (QTc) compared to controls, and, among carriers, the QTc polygenic score is independently associated with QTc prolongation. Thus, our innovative analysis of shared chromosomal segments identifies undiagnosed cases of genetic disease and refines the understanding of LQT5 penetrance and phenotype.
Collapse
Affiliation(s)
- Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Hung-Hsin Chen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 11524, Taiwan
| | - M Benjamin Shoemaker
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Matthew R Fleming
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Teresa L Strickland
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - James T Baker
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Grahame F Evans
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Hannah G Polikowsky
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, 37232, USA
| | - Chad D Huff
- Division of Cancer Prevention and Population Sciences, Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jennifer E Below
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| |
Collapse
|
6
|
Mahmoudiandehkordi S, Maadooliat M, Schrodi SJ. gwid: an R package and Shiny application for Genome-Wide analysis of IBD data. BIOINFORMATICS ADVANCES 2024; 4:vbae115. [PMID: 39246385 PMCID: PMC11379470 DOI: 10.1093/bioadv/vbae115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 06/13/2024] [Accepted: 07/29/2024] [Indexed: 09/10/2024]
Abstract
Summary Genome-wide identity by descent (gwid) is an R package developed for the analysis of identity-by-descent (IBD) data pertaining to dichotomous traits. This package offers a set of tools to assess differential IBD levels for the two states of a binary trait, yielding informative and meaningful results. Furthermore, it provides convenient functions to visualize the outcomes of these analyses, enhancing the interpretability and accessibility of the results. To assess the performance of the package, we conducted an evaluation using real genotype data derived from the SNPs to investigate rheumatoid arthritis susceptibility from the Marshfield Clinic Personalized Medicine Research Project. Availability and implementation gwid is available as an open-source R package. Release versions can be accessed on CRAN (https://cran.r-project.org/package=gwid) for all major operating systems. The development version is maintained on GitHub (https://github.com/soroushmdg/gwid) and full documentation with examples and workflow templates is provided via the package website (http://tinyurl.com/gwid-tutorial). An interactive R Shiny dashboard is also developed (https://tinyurl.com/gwid-shiny).
Collapse
Affiliation(s)
- Soroush Mahmoudiandehkordi
- Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI 53233, United States
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Mehdi Maadooliat
- Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI 53233, United States
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Steven J Schrodi
- Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI 53706, United States
- Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
7
|
Zhang W, Yuan K, Wen R, Li H, Ni X. Reconstruct recent multi-population migration history by using identical-by-descent sharing. J Genet Genomics 2024; 51:642-651. [PMID: 38423503 DOI: 10.1016/j.jgg.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
Identical-by-descent (IBD) is a fundamental genomic characteristic in population genetics and has been widely used for population history reconstruction. However, limited by the nature of IBD, which could only capture the relationship between two individuals/haplotypes, existing IBD-based history inference is constrained to two populations. In this study, we propose a framework by leveraging IBD sharing in multi-population and develop a method, MatrixIBD, to reconstruct recent multi-population migration history. Specifically, we employ the structured coalescent theory to precisely model the genealogical process and then estimate the IBD sharing across multiple populations. Within our model, we establish a theoretical connection between migration history and IBD sharing. Our method is rigorously evaluated through simulations, revealing its remarkable accuracy and robustness. Furthermore, we apply MatrixIBD to Central and South Asia in the Human Genome Diversity Project and successfully reconstruct the recent migration history of three closely related populations in South Asia. By taking into account the IBD sharing across multiple populations simultaneously, MatrixIBD enables us to attain clearer and more comprehensive insights into the history of regions characterized by complex migration dynamics, providing a holistic perspective on intricate patterns embedded within the recent population migration history.
Collapse
Affiliation(s)
- Wenxiao Zhang
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
| | - Kai Yuan
- The Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ru Wen
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
| | - Haifang Li
- Baidu Incorporated, Beijing 100085, China
| | - Xumin Ni
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China.
| |
Collapse
|
8
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. Am J Hum Genet 2024; 111:691-700. [PMID: 38513668 PMCID: PMC11023918 DOI: 10.1016/j.ajhg.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
9
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. Am J Hum Genet 2023; 110:2077-2091. [PMID: 38065072 PMCID: PMC10716520 DOI: 10.1016/j.ajhg.2023.10.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023] Open
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Joshua G Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Michael D Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
10
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
11
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565574. [PMID: 37961601 PMCID: PMC10635131 DOI: 10.1101/2023.11.03.565574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.
Collapse
Affiliation(s)
| | - Brian L. Browning
- Department of Biostatistics, University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| |
Collapse
|
12
|
Lancaster MC, Chen HH, Shoemaker MB, Fleming MR, Baker JT, Evans G, Polikowsky HG, Samuels DC, Huff CD, Roden DM, Below JE. Detection of distant relatedness in biobanks for identification of undiagnosed carriers of a Mendelian disease variant: application to Long QT Syndrome. RESEARCH SQUARE 2023:rs.3.rs-3314860. [PMID: 37790303 PMCID: PMC10543295 DOI: 10.21203/rs.3.rs-3314860/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Rare genetic diseases are typically studied in referral populations, resulting in underdiagnosis and biased assessment of penetrance and phenotype. To address this, we developed a generalizable method of genotype inference based on distant relatedness and deployed this to identify undiagnosed Type 5 Long QT Syndrome (LQT5) rare variant carriers in a non-referral population. We identified 9 LQT5 families referred to a single specialty clinic, each carrying p.Asp76Asn, the most common LQT5 variant. We uncovered recent common ancestry and a single shared haplotype among probands. Application to a non-referral population of 69,819 BioVU biobank subjects identified 22 additional subjects sharing this haplotype, subsequently confirmed to carry p.Asp76Asn. Referral and non-referral carriers had prolonged QTc compared to controls, and, among carriers, QTc polygenic score additively associated with QTc prolongation. Thus, our novel analysis of shared chromosomal segments identified undiagnosed cases of genetic disease and refined the understanding of LQT5 penetrance and phenotype.
Collapse
Affiliation(s)
- Megan C Lancaster
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Hung-Hsin Chen
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - M Benjamin Shoemaker
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Matthew R Fleming
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - James T Baker
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Grahame Evans
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Hannah G Polikowsky
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, 37232, U.S.A
| | - Chad D Huff
- Division of Cancer Prevention and Population Sciences, Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, U.S.A
| | - Dan M Roden
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| | - Jennifer E Below
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, 37232, U.S.A
| |
Collapse
|
13
|
Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet 2023; 55:768-776. [PMID: 37127670 PMCID: PMC10181934 DOI: 10.1038/s41588-023-01379-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
Collapse
Affiliation(s)
- Brian C Zhang
- Department of Statistics, University of Oxford, Oxford, UK
| | - Arjun Biddanda
- Department of Statistics, University of Oxford, Oxford, UK
| | - Árni Freyr Gunnarsson
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
14
|
Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CW, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536093. [PMID: 37066144 PMCID: PMC10104234 DOI: 10.1101/2023.04.07.536093] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide Association Studies (GWAS) are a powerful way to find genetic loci associated with phenotypes. GWAS are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix given the ARG (local eGRM). Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to identify a large-effect BMI locus, the CREBRF gene, in a sample of Native Hawaiians in which it was not previously detectable by GWAS because of a lack of population-specific imputation resources. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California
| | - Caoqi Fan
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Bryan Dinh
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Charleston W.K. Chiang
- Department of Quantitative and Computational Biology, University of Southern California
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
15
|
Nickchi P, Karunarathna C, Graham J. An exploration of linkage fine-mapping on sequences from case-control studies. Genet Epidemiol 2023; 47:78-94. [PMID: 36047334 PMCID: PMC10087369 DOI: 10.1002/gepi.22502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 05/30/2022] [Accepted: 08/09/2022] [Indexed: 02/01/2023]
Abstract
Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with excess relatedness among individuals with similar trait values. Analysis may be conducted on related individuals from families, or on samples of unrelated individuals from a population. For allelically heterogeneous traits, population-based linkage analysis can be more powerful than genotypic-association analysis. Here, we focus on linkage analysis in a population sample, but use sequences rather than individuals as our unit of observation. Earlier investigations of sequence-based linkage mapping relied on known sequence relatedness, whereas we infer relatedness from the sequence data. We propose two ways to associate similarity in relatedness of sequences with similarity in their trait values and compare the resulting linkage methods to two genotypic-association methods. We also introduce a procedure to label case sequences as potential carriers or noncarriers of causal variants after an association has been found. This post hoc labeling of case sequences is based on inferred relatedness to other case sequences. Our simulation results indicate that methods based on sequence relatedness improve localization and perform as well as genotypic-association methods for detecting rare causal variants. Sequence-based linkage analysis therefore has potential to fine-map allelically heterogeneous disease traits.
Collapse
Affiliation(s)
- Payman Nickchi
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Charith Karunarathna
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada.,Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Jinko Graham
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
16
|
Shemirani R, Belbin GM, Burghardt K, Lerman K, Avery CL, Kenny EE, Gignoux CR, Ambite JL. Selecting Clustering Algorithms for Identity-By-Descent Mapping. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:121-132. [PMID: 36540970 PMCID: PMC9782725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications. We extend our findings to real datasets by analyzing the Population Architecture using Genomics and Epidemiology (PAGE) Study dataset with 51,000 samples and 2 million shared segments on Chromosome 1, resulting in the extraction of 39 million local IBD clusters. We demonstrate the power of our approach by recovering signals of rare genetic variation in the Whole-Exome Sequence data of 200,000 individuals in the UK Biobank. We provide an efficient implementation to enable clustering at scale for IBD mapping for various populations and scenarios.Supplementary Information: The code, along with supplementary methods and figures are available at https://github.com/roohy/localIBDClustering.
Collapse
Affiliation(s)
- Ruhollah Shemirani
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA,
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Tang K, Naseri A, Wei Y, Zhang S, Zhi D. Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts. Gigascience 2022; 11:giac111. [PMID: 36472573 PMCID: PMC9724555 DOI: 10.1093/gigascience/giac111] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/04/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
In the recent biobank era of genetics, the problem of identical-by-descent (IBD) segment detection received renewed interest, as IBD segments in large cohorts offer unprecedented opportunities in the study of population and genealogical history, as well as genetic association of long haplotypes. While a new generation of efficient methods for IBD segment detection becomes available, direct comparison of these methods is difficult: existing benchmarks were often evaluated in different datasets, with some not openly accessible; methods benchmarked were run under suboptimal parameters; and benchmark performance metrics were not defined consistently. Here, we developed a comprehensive and completely open-source evaluation of the power, accuracy, and resource consumption of these IBD segment detection methods using realistic population genetic simulations with various settings. Our results pave the road for fair evaluation of IBD segment detection methods and provide an practical guide for users.
Collapse
Affiliation(s)
- Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
18
|
Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank. Proc Natl Acad Sci U S A 2022; 119:e2119281119. [PMID: 35696575 PMCID: PMC9233301 DOI: 10.1073/pnas.2119281119] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Haplotype-based analyses have recently been leveraged to interrogate the fine-scale structure in specific geographic regions, notably in Europe, although an equivalent haplotype-based understanding across the whole of Europe with these tools is lacking. Furthermore, study of identity-by-descent (IBD) sharing in a large sample of haplotypes across Europe would allow a direct comparison between different demographic histories of different regions. The UK Biobank (UKBB) is a population-scale dataset of genotype and phenotype data collected from the United Kingdom, with established sampling of worldwide ancestries. The exact content of these non-UK ancestries is largely uncharacterized, where study could highlight valuable intracontinental ancestry references with deep phenotyping within the UKBB. In this context, we sought to investigate the sample of European ancestry captured in the UKBB. We studied the haplotypes of 5,500 UKBB individuals with a European birthplace; investigated the population structure and demographic history in Europe, showing in parallel the variety of footprints of demographic history in different genetic regions around Europe; and expand knowledge of the genetic landscape of the east and southeast of Europe. Providing an updated map of European genetics, we leverage IBD-segment sharing to explore the extent of population isolation and size across the continent. In addition to building and expanding upon previous knowledge in Europe, our results show the UKBB as a source of diverse ancestries beyond Britain. These worldwide ancestries sampled in the UKBB may complement and inform researchers interested in specific communities or regions not limited to Britain.
Collapse
|
19
|
Belbin GM, Rutledge S, Dodatko T, Cullina S, Turchin MC, Kohli S, Torre D, Yee MC, Gignoux CR, Abul-Husn NS, Houten SM, Kenny EE. Leveraging health systems data to characterize a large effect variant conferring risk for liver disease in Puerto Ricans. Am J Hum Genet 2021; 108:2099-2111. [PMID: 34678161 PMCID: PMC8595966 DOI: 10.1016/j.ajhg.2021.09.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 09/28/2021] [Indexed: 12/22/2022] Open
Abstract
The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.
Collapse
Affiliation(s)
- Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Stephanie Rutledge
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Tetyana Dodatko
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sinead Cullina
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael C Turchin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sumita Kohli
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Denis Torre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Muh-Ching Yee
- Stanford Functional Genomics Facility, Stanford University, Stanford, CA 94305, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Noura S Abul-Husn
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sander M Houten
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
20
|
Sticca EL, Belbin GM, Gignoux CR. Current Developments in Detection of Identity-by-Descent Methods and Applications. Front Genet 2021; 12:722602. [PMID: 34567074 PMCID: PMC8461052 DOI: 10.3389/fgene.2021.722602] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/24/2021] [Indexed: 01/23/2023] Open
Abstract
Identity-by-descent (IBD), the detection of shared segments inherited from a common ancestor, is a fundamental concept in genomics with broad applications in the characterization and analysis of genomes. While historically the concept of IBD was extensively utilized through linkage analyses and in studies of founder populations, applications of IBD-based methods subsided during the genome-wide association study era. This was primarily due to the computational expense of IBD detection, which becomes increasingly relevant as the field moves toward the analysis of biobank-scale datasets that encompass individuals from highly diverse backgrounds. To address these computational barriers, the past several years have seen new methodological advances enabling IBD detection for datasets in the hundreds of thousands to millions of individuals, enabling novel analyses at an unprecedented scale. Here, we describe the latest innovations in IBD detection and describe opportunities for the application of IBD-based methods across a broad range of questions in the field of genomics.
Collapse
Affiliation(s)
- Evan L Sticca
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Gillian M Belbin
- Institute for Genomic Health, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
21
|
Tan KT, Kim H, Carrot-Zhang J, Zhang Y, Kim WJ, Kugener G, Wala JA, Howard TP, Chi YY, Beroukhim R, Li H, Ha G, Alper SL, Perlman EJ, Mullen EA, Hahn WC, Meyerson M, Hong AL. Haplotype-resolved germline and somatic alterations in renal medullary carcinomas. Genome Med 2021; 13:114. [PMID: 34261517 PMCID: PMC8281718 DOI: 10.1186/s13073-021-00929-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Renal medullary carcinomas (RMCs) are rare kidney cancers that occur in adolescents and young adults of African ancestry. Although RMC is associated with the sickle cell trait and somatic loss of the tumor suppressor, SMARCB1, the ancestral origins of RMC remain unknown. Further, characterization of structural variants (SVs) involving SMARCB1 in RMC remains limited. METHODS We used linked-read genome sequencing to reconstruct germline and somatic haplotypes in 15 unrelated patients with RMC registered on the Children's Oncology Group (COG) AREN03B2 study between 2006 and 2017 or from our prior study. We performed fine-mapping of the HBB locus and assessed the germline for cancer predisposition genes. Subsequently, we assessed the tumor samples for mutations outside of SMARCB1 and integrated RNA sequencing to interrogate the structural variants at the SMARCB1 locus. RESULTS We find that the haplotype of the sickle cell mutation in patients with RMC originated from three geographical regions in Africa. In addition, fine-mapping of the HBB locus identified the sickle cell mutation as the sole candidate variant. We further identify that the SMARCB1 structural variants are characterized by blunt or 1-bp homology events. CONCLUSIONS Our findings suggest that RMC does not arise from a single founder population and that the HbS allele is a strong candidate germline allele which confers risk for RMC. Furthermore, we find that the SVs that disrupt SMARCB1 function are likely repaired by non-homologous end-joining. These findings highlight how haplotype-based analyses using linked-read genome sequencing can be applied to identify potential risk variants in small and rare disease cohorts and provide nucleotide resolution to structural variants.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Hyunji Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuxiang Zhang
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Won Jun Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jeremiah A Wala
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Thomas P Howard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, University of Southern California, Los Angeles, CA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heng Li
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gavin Ha
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Seth L Alper
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - Elizabeth A Mullen
- Department of Hematology and Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - William C Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Hong
- Department of Pediatrics, Emory University, Atlanta, GA, USA.
- Aflac Center for Cancer and Blood Disorders, Children's Healthcare of Atlanta, Atlanta, GA, USA.
| |
Collapse
|
22
|
Freyman WA, McManus KF, Shringarpure SS, Jewett EM, Bryc K, Auton A. Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform. Mol Biol Evol 2021; 38:2131-2151. [PMID: 33355662 PMCID: PMC8097300 DOI: 10.1093/molbev/msaa328] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).
Collapse
|
23
|
Sole-Navais P, Bacelis J, Helgeland Ø, Modzelewska D, Vaudel M, Flatley C, Andreassen O, Njølstad PR, Muglia LJ, Johansson S, Zhang G, Jacobsson B. Autozygosity mapping and time-to-spontaneous delivery in Norwegian parent-offspring trios. Hum Mol Genet 2020; 29:3845-3858. [PMID: 33291140 PMCID: PMC7861013 DOI: 10.1093/hmg/ddaa255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 11/21/2020] [Accepted: 11/24/2020] [Indexed: 11/18/2022] Open
Abstract
Parental genetic relatedness may lead to adverse health and fitness outcomes in the offspring. However, the degree to which it affects human delivery timing is unknown. We use genotype data from ≃25 000 parent-offspring trios from the Norwegian Mother, Father and Child Cohort Study to optimize runs of homozygosity (ROH) calling by maximizing the correlation between parental genetic relatedness and offspring ROHs. We then estimate the effect of maternal, paternal and fetal autozygosity and that of autozygosity mapping (common segments and gene burden test) on the timing of spontaneous onset of delivery. The correlation between offspring ROH using a variety of parameters and parental genetic relatedness ranged between −0.2 and 0.6, revealing the importance of the minimum number of genetic variants included in an ROH and the use of genetic distance. The optimized compared to predefined parameters showed a ≃45% higher correlation between parental genetic relatedness and offspring ROH. We found no evidence of an effect of maternal, paternal nor fetal overall autozygosity on spontaneous delivery timing. Yet, through autozygosity mapping, we identified three maternal loci TBC1D1, SIGLECs and EDN1 gene regions reducing the median time-to-spontaneous onset of delivery by ≃2–5% (P-value < 2.3 × 10−6). We also found suggestive evidence of a fetal locus at 3q22.2, near the RYK gene region (P-value = 2.0 × 10−6). Autozygosity mapping may provide new insights on the genetic determinants of delivery timing beyond traditional genome-wide association studies, but particular and rigorous attention should be given to ROH calling parameter selection.
Collapse
Affiliation(s)
- Pol Sole-Navais
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden
| | - Jonas Bacelis
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden
| | - Øyvind Helgeland
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Division of Health Data and Digitalization, Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo 0213, Norway
| | - Dominika Modzelewska
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden
| | - Marc Vaudel
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen 5021, Norway
| | - Christopher Flatley
- Division of Health Data and Digitalization, Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo 0213, Norway
| | - Ole Andreassen
- NORMENT, University of Oslo, Oslo 0450, Norway.,Division of Mental Health and Addiction, Oslo University Hospital, Oslo 0450, Norway.,Department of Psychiatry, University of California San Diego, San Diego, CA 92093, USA
| | - Pål R Njølstad
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Department of Pediatrics and Adolescents, Haukeland University Hospital, Bergen 5021, Norway
| | - Louis J Muglia
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.,Division of Human Genetics, The Center for Prevention of Preterm Birth, Perinatal Institute, March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45267, USA
| | - Stefan Johansson
- Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway.,Center for Medical Genetics, Haukeland University Hospital, Bergen 5021, Norway
| | - Ge Zhang
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA.,Division of Human Genetics, The Center for Prevention of Preterm Birth, Perinatal Institute, March of Dimes Prematurity Research Center Ohio Collaborative, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45267, USA
| | - Bo Jacobsson
- Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 41685, Sweden.,Division of Health Data and Digitalization, Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo 0213, Norway.,Department of Obstetrics and Gynecology, Sahlgrenska University Hospital, Gothenburg 41685, Sweden
| |
Collapse
|
24
|
Nait Saada J, Kalantzis G, Shyr D, Cooper F, Robinson M, Gusev A, Palamara PF. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat Commun 2020; 11:6130. [PMID: 33257650 PMCID: PMC7704644 DOI: 10.1038/s41467-020-19588-x] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/02/2020] [Indexed: 12/14/2022] Open
Abstract
Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample's birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.
Collapse
Affiliation(s)
| | | | - Derek Shyr
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Martin Robinson
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Alexander Gusev
- Brigham & Women's Hospital, Division of Genetics, Boston, MA, 02215, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
25
|
Samuels DC, Below JE, Ness S, Yu H, Leng S, Guo Y. Alternative Applications of Genotyping Array Data Using Multivariant Methods. Trends Genet 2020; 36:857-867. [PMID: 32773169 PMCID: PMC7572808 DOI: 10.1016/j.tig.2020.07.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 10/23/2022]
Abstract
One of the forerunners that pioneered the revolution of high-throughput genomic technologies is the genotyping microarray technology, which can genotype millions of single-nucleotide variants simultaneously. Owing to apparent benefits, such as high speed, low cost, and high throughput, the genotyping array has gained lasting applications in genome-wide association studies (GWAS) and thus accumulated an enormous amount of data. Empowered by continuous manufactural upgrades and analytical innovation, unconventional applications of genotyping array data have emerged to address more diverse genetic problems, holding promise of boosting genetic research into human diseases through the re-mining of the rich accumulated data. Here, we review several unconventional genotyping array analysis techniques that have been built on the idea of large-scale multivariant analysis and provide empirical application examples. These unconventional outcomes of genotyping arrays include polygenic score, runs of homozygosity (ROH)/heterozygosity ratio, distant pedigree computation, and mitochondrial DNA (mtDNA) copy number inference.
Collapse
Affiliation(s)
- David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Jennifer E Below
- Devision of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Scott Ness
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Hui Yu
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Shuguang Leng
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Yan Guo
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA.
| |
Collapse
|
26
|
Han Z, Hu Y, Tian Q, Cao Y, Si A, Si Z, Zang Y, Xu C, Shen W, Dai F, Liu X, Fang L, Chen H, Zhang T. Genomic signatures and candidate genes of lint yield and fibre quality improvement in Upland cotton in Xinjiang. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:2002-2014. [PMID: 32030869 PMCID: PMC7540456 DOI: 10.1111/pbi.13356] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 01/21/2020] [Indexed: 06/10/2023]
Abstract
Xinjiang has been the largest and highest yield cotton production region not only in China, but also in the world. Improvements in Upland cotton cultivars in Xinjiang have occurred via pedigree selection and/or crossing of elite alleles from the former Soviet Union and other cotton producing regions of China. But it is unclear how genomic constitutions from foundation parents have been selected and inherited. Here, we deep-sequenced seven historic foundation parents, comprising four cultivars introduced from the former Soviet Union (108Ф, C1470, 611Б and KK1543) and three from United States and Africa (DPL15, STV2B and UGDM), and re-sequenced sixty-nine Xinjiang modern cultivars. Phylogenetic analysis of more than 2 million high-quality single nucleotide polymorphisms allowed their classification two groups, suggesting that Xinjiang Upland cotton cultivars were not only spawned from 108Ф, C1470, 611Б and KK1543, but also had a close kinship with DPL15, STV2B and UGDM. Notably, identity-by-descent (IBD) tracking demonstrated that the former Soviet Union cultivars have made a huge contribution to modern cultivar improvement in Xinjiang. A total of 156 selective sweeps were identified. Among them, apoptosis-antagonizing transcription factor gene (GhAATF1) and mitochondrial transcription termination factor family protein gene (GhmTERF1) were highly involved in the determination of lint percentage. Additionally, the auxin response factor gene (GhARF3) located in inherited IBD segments from 108Ф and 611Б was highly correlated with fibre quality. These results provide an insight into the genomics of artificial selection for improving cotton production and facilitate next-generation precision breeding of cotton and other crops.
Collapse
Affiliation(s)
- Zegang Han
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Qin Tian
- Key Laboratory of China Northwestern Inland RegionMinistry of AgricultureCotton Research InstituteXinjiang Academy of Agricultural and Reclamation ScienceShiheziChina
| | - Yiwen Cao
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Aijun Si
- Key Laboratory of China Northwestern Inland RegionMinistry of AgricultureCotton Research InstituteXinjiang Academy of Agricultural and Reclamation ScienceShiheziChina
| | - Zhanfeng Si
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Yihao Zang
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Chenyu Xu
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Weijuan Shen
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Fan Dai
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Xia Liu
- Esquel GroupWanchai, Hong KongChina
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Hong Chen
- Key Laboratory of China Northwestern Inland RegionMinistry of AgricultureCotton Research InstituteXinjiang Academy of Agricultural and Reclamation ScienceShiheziChina
| | - Tianzhen Zhang
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| |
Collapse
|
27
|
Stapleton CP, Lord GM, Conlon PJ, Cavalleri GL. The relationship between donor-recipient genetic distance and long-term kidney transplant outcome. HRB Open Res 2020; 3:47. [PMID: 33655195 PMCID: PMC7888353 DOI: 10.12688/hrbopenres.13021.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/29/2020] [Indexed: 12/18/2022] Open
Abstract
Background: We set out to quantify shared genetic ancestry between unrelated kidney donor-recipient pairs and test it as a predictor of time to graft failure. Methods: In a homogenous, unrelated, European cohort of deceased-donor kidney transplant pairs (n pairs = 1,808), we calculated, using common genetic variation, shared ancestry at the genic (n loci=40,053) and genomic level. We conducted a sub-analysis focused on transmembrane protein coding genes (n transcripts=8,637) and attempted replication of a previously published nonsynonymous transmembrane mismatch score. Measures of shared genetic ancestry were tested in a survival model against time to death-censored graft failure. Results: Shared ancestry calculated across the human leukocyte antigen (HLA) significantly associated with graft survival in individuals who had a high serological mismatch (n pairs = 186) with those who did not have any HLA mismatches indicating that shared ancestry calculated specific loci can capture known associations with genes impacting graft outcome. None of the other measures of shared ancestry at a genic level, genome-wide scale, transmembrane subset or nonsynonymous transmembrane mismatch score analysis were significant predictors of time to graft failure. Conclusions: In a large unrelated, deceased-donor European ancestry renal transplant cohort, shared donor-recipient genetic ancestry, calculated using common genetic variation, has limited value in predicting transplant outcome both on a genomic scale and at a genic level (other than at the HLA loci).
Collapse
Affiliation(s)
- Caragh P. Stapleton
- Department of Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Graham M. Lord
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- NIHR Biomedical Research Centre at Guy’s and St Thomas’, NHS Foundation Trust and King’s College London, London, UK
| | - UK and Ireland Renal Transplant Consortium
- Department of Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- NIHR Biomedical Research Centre at Guy’s and St Thomas’, NHS Foundation Trust and King’s College London, London, UK
- Department of Nephrology, Beaumont Hospital, Dublin, Dublin, Ireland
- Department of Medicine, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Peter J. Conlon
- Department of Nephrology, Beaumont Hospital, Dublin, Dublin, Ireland
- Department of Medicine, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Gianpiero L. Cavalleri
- Department of Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, Ireland
| |
Collapse
|
28
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
29
|
Abstract
Extremely rare diseases are increasingly recognized due to wide-spread, inexpensive genomic sequencing. Understanding the incidence of rare disease is important for appreciating its health impact and allocating recourses for research. However, estimating incidence of rare disease is challenging because the individual contributory alleles are, themselves, extremely rare. We propose a new method to determine incidence of rare, severe, recessive disease in non-consanguineous populations that use known allele frequencies, estimate the combined allele frequency of observed alleles and estimate the number of causative alleles that are thus far unobserved in a disease cohort. Experiments on simulated and real data show that this approach is a feasible method to estimate the incidence of rare disease in European populations but due to several limitations in our ability to assess the full spectrum of pathogenic mutations serves as a useful tool to provide a lower threshold on disease incidence.
Collapse
|
30
|
Taylor AR, Jacob PE, Neafsey DE, Buckee CO. Estimating Relatedness Between Malaria Parasites. Genetics 2019; 212:1337-1351. [PMID: 31209105 PMCID: PMC6707449 DOI: 10.1534/genetics.119.302120] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 06/03/2019] [Indexed: 11/18/2022] Open
Abstract
Understanding the relatedness of individuals within or between populations is a common goal in biology. Increasingly, relatedness features in genetic epidemiology studies of pathogens. These studies are relatively new compared to those in humans and other organisms, but are important for designing interventions and understanding pathogen transmission. Only recently have researchers begun to routinely apply relatedness to apicomplexan eukaryotic malaria parasites, and to date have used a range of different approaches on an ad hoc basis. Therefore, it remains unclear how to compare different studies and which measures to use. Here, we systematically compare measures based on identity-by-state (IBS) and identity-by-descent (IBD) using a globally diverse data set of malaria parasites, Plasmodium falciparum and P. vivax, and provide marker requirements for estimates based on IBD. We formally show that the informativeness of polyallelic markers for relatedness inference is maximized when alleles are equifrequent. Estimates based on IBS are sensitive to allele frequencies, which vary across populations and by experimental design. For portability across studies, we thus recommend estimates based on IBD. To generate estimates with errors below an arbitrary threshold of 0.1, we recommend ∼100 polyallelic or 200 biallelic markers. Marker requirements are immediately applicable to haploid malaria parasites and other haploid eukaryotes. C.I.s facilitate comparison when different marker sets are used. This is the first attempt to provide rigorous analysis of the reliability of, and requirements for, relatedness inference in malaria genetic epidemiology. We hope it will provide a basis for statistically informed prospective study design and surveillance strategies.
Collapse
Affiliation(s)
- Aimee R Taylor
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Pierre E Jacob
- Department of Statistics, Harvard University, Cambridge, Massachusetts 02138
| | - Daniel E Neafsey
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115
| | - Caroline O Buckee
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115
| |
Collapse
|
31
|
Naseri A, Liu X, Tang K, Zhang S, Zhi D. RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts. Genome Biol 2019; 20:143. [PMID: 31345249 PMCID: PMC6659282 DOI: 10.1186/s13059-019-1754-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 07/03/2019] [Indexed: 11/10/2022] Open
Abstract
While genetic relatedness, usually manifested as segments identical by descent (IBD), is ubiquitous in modern large biobanks, current IBD detection methods are not efficient at such a scale. Here, we describe an efficient method, RaPID, for detecting IBD segments in a panel with phased haplotypes. RaPID achieves a time and space complexity linear to the input size and the number of reported IBDs. With simulation, we showed that RaPID is orders of magnitude faster than existing method while offering competitive power and accuracy. In UK Biobank, RaPID identified 3,335,807 IBDs with a lenght ≥ 10 cM among 223,507 male X chromosomes in 11 min.
Collapse
Affiliation(s)
- Ardalan Naseri
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Department of Epidemiology, Human Genetics & Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
32
|
Barrot CC, Woillard JB, Picard N. Big data in pharmacogenomics: current applications, perspectives and pitfalls. Pharmacogenomics 2019; 20:609-620. [PMID: 31190620 DOI: 10.2217/pgs-2018-0184] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The efficiency of new generation sequencing methods and the reduction of their cost has led pharmacogenomics to gradually supplant pharmacogenetics, leading to new applications in personalized medicine along with new perspectives in drug design or identification of drug response factors. The amount of data generated in genomics fits the definition of big data, and need a specific bioinformatics processing following standard steps: data collection, processing, analysis and interpretation. Pitfalls of pharmacogenomics studies are directly related to these steps. This review aims to describe these steps from a pharmacogenomic point of view, focusing on bioinformatics aspects.
Collapse
Affiliation(s)
- Claire-Cécile Barrot
- INSERM, IPPRITT, U1248, F-87000, Limoges, France; Univ. Limoges, IPPRITT, F-87000 Limoges, France
| | - Jean-Baptiste Woillard
- INSERM, IPPRITT, U1248, F-87000, Limoges, France; Univ. Limoges, IPPRITT, F-87000 Limoges, France
| | - Nicolas Picard
- INSERM, IPPRITT, U1248, F-87000, Limoges, France; Univ. Limoges, IPPRITT, F-87000 Limoges, France
| |
Collapse
|
33
|
My Cousin Also Has Atrial Fibrillation: Family Relationships in a Genomic Era. JACC Clin Electrophysiol 2019; 5:501-503. [PMID: 31000105 DOI: 10.1016/j.jacep.2019.03.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 03/05/2019] [Accepted: 03/06/2019] [Indexed: 01/24/2023]
|
34
|
Harold D, Connolly S, Riley BP, Kendler KS, McCarthy SE, McCombie WR, Richards A, Owen MJ, O'Donovan MC, Walters J, Donohoe G, Gill M, Corvin A, Morris DW. Population-based identity-by-descent mapping combined with exome sequencing to detect rare risk variants for schizophrenia. Am J Med Genet B Neuropsychiatr Genet 2019; 180:223-231. [PMID: 30801977 PMCID: PMC8863274 DOI: 10.1002/ajmg.b.32716] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 10/22/2018] [Accepted: 12/03/2018] [Indexed: 12/30/2022]
Abstract
Genome-wide association studies (GWASs) are highly effective at identifying common risk variants for schizophrenia. Rare risk variants are also important contributors to schizophrenia etiology but, with the exception of large copy number variants, are difficult to detect with GWAS. Exome and genome sequencing, which have accelerated the study of rare variants, are expensive so alternative methods are needed to aid detection of rare variants. Here we re-analyze an Irish schizophrenia GWAS dataset (n = 3,473) by performing identity-by-descent (IBD) mapping followed by exome sequencing of individuals identified as sharing risk haplotypes to search for rare risk variants in coding regions. We identified 45 rare haplotypes (>1 cM) that were significantly more common in cases than controls. By exome sequencing 105 haplotype carriers, we investigated these haplotypes for functional coding variants that could be tested for association in independent GWAS samples. We identified one rare missense variant in PCNT but did not find statistical support for an association with schizophrenia in a replication analysis. However, IBD mapping can prioritize both individual samples and genomic regions for follow-up analysis but genome rather than exome sequencing may be more effective at detecting risk variants on rare haplotypes.
Collapse
Affiliation(s)
- Denise Harold
- Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine and Discipline of Psychiatry, Trinity College Dublin, Dublin, Ireland
- School of Biotechnology, Dublin City University, Dublin, Ireland
| | - Siobhan Connolly
- Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine and Discipline of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Brien P Riley
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia
| | - Kenneth S Kendler
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia
| | - Shane E McCarthy
- The Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | - William R McCombie
- The Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | - Alex Richards
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, United Kingdom
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, United Kingdom
| | - Michael C O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, United Kingdom
| | - James Walters
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University School of Medicine, Cardiff, United Kingdom
| | - Gary Donohoe
- Cognitive Genetics and Cognitive Therapy Group, Neuroimaging, Cognition & Genomics (NICOG) Centre & NCBES Galway Neuroscience Centre, School of Psychology and Discipline of Biochemistry, National University of Ireland Galway, Galway, Ireland
| | - Michael Gill
- Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine and Discipline of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Aiden Corvin
- Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine and Discipline of Psychiatry, Trinity College Dublin, Dublin, Ireland
| | - Derek W Morris
- Cognitive Genetics and Cognitive Therapy Group, Neuroimaging, Cognition & Genomics (NICOG) Centre & NCBES Galway Neuroscience Centre, School of Psychology and Discipline of Biochemistry, National University of Ireland Galway, Galway, Ireland
| |
Collapse
|
35
|
Yang X, Wu W, Peng M, Shen Q, Feng J, Lai W, Zhu H, Tu C, Quan X, Chen Y, Qin L, Li D, He L, Zhang Y. Identity-by-Descent Analysis Reveals Susceptibility Loci for Severe Acne in Chinese Han Cohort. J Invest Dermatol 2019; 139:2049-2051.e20. [PMID: 30922884 DOI: 10.1016/j.jid.2019.03.1132] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/20/2019] [Accepted: 03/05/2019] [Indexed: 10/27/2022]
Affiliation(s)
- Xingyan Yang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Life Sciences, Yunnan University, Kunming, China
| | - Wenjuan Wu
- Department of Dermatology, First Affiliated Hospital of Kunming Medical University, Institute of Dermatology and Venereology of Yunnan Province, Kunming, China
| | - Minsheng Peng
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, China; KIZ/CUHK Joint Laboratory of Bio-Resources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Quankuan Shen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, China
| | - Jiaqi Feng
- Department of Dermatology, First Affiliated Hospital of Kunming Medical University, Institute of Dermatology and Venereology of Yunnan Province, Kunming, China
| | - Wei Lai
- Department of Dermatology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Huilan Zhu
- Guangzhou Institute of Dermatology, Guangzhou, China
| | - Caixia Tu
- Department of Dermatology, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Xiaorong Quan
- Guilin Skin Disease Prevention and Treatment Hospital, Guilin, China
| | - Yihong Chen
- Department of Dermatology, Zhangzhou Affiliated Hospital of Fujian Medical University, Zhangzhou, China
| | - Lanying Qin
- Department of Dermatology, Cangzhou People's Hospital, Cangzhou, China
| | - Donglin Li
- Department of Dermatology, First Affiliated Hospital of Kunming Medical University, Institute of Dermatology and Venereology of Yunnan Province, Kunming, China
| | - Li He
- Department of Dermatology, First Affiliated Hospital of Kunming Medical University, Institute of Dermatology and Venereology of Yunnan Province, Kunming, China.
| | - Yaping Zhang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Life Sciences, Yunnan University, Kunming, China; State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, China; KIZ/CUHK Joint Laboratory of Bio-Resources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
36
|
Fazia T, Pastorino R, Foco L, Han L, Abney M, Beecham A, Hadjixenofontos A, Guo H, Gentilini D, Papachristou C, Bitti PP, Ticca A, Berzuini C, McCauley JL, Bernardinelli L. Investigating multiple sclerosis genetic susceptibility on the founder population of east-central Sardinia via association and linkage analysis of immune-related loci. Mult Scler 2018; 24:1815-1824. [PMID: 28933650 DOI: 10.1177/1352458517732841] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
BACKGROUND A wealth of single-nucleotide polymorphisms (SNPs) responsible for multiple sclerosis (MS) susceptibility have been identified; however, they explain only a fraction of MS heritability. OBJECTIVES We contributed to discovery of new MS susceptibility SNPs by studying a founder population with high MS prevalence. METHODS We analyzed ImmunoChip data from 15 multiplex families and 94 unrelated controls from the Nuoro Province, Sardinia, Italy. We tested each SNP for both association and linkage with MS, the linkage being explored in terms of identity-by-descent (IBD) sharing excess and using gene dropping to compute a corresponding empirical p-value. By targeting regions that are both associated and in linkage with MS, we increase chances of identifying interesting genomic regions. RESULTS We identified 486 MS-associated (p < 1 × 10-4) and 18,426 MS-linked (p < 0.05) SNPs. A total of 111 loci were both linked and associated with MS, 18 of them pointing to 14 non-major histocompatibility complex (MHC) genes, and 93 of them located in the MHC region. CONCLUSION We discovered new suggestive signals and confirmed some previously identified ones. We believe this to represent a significant step toward an understanding of the genetic basis of MS.
Collapse
Affiliation(s)
- Teresa Fazia
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy
| | - Roberta Pastorino
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy
| | - Luisa Foco
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy; Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, Bolzano, Italy
| | - Lide Han
- Department of Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Ashley Beecham
- John P. Hussmann Institute for Human Genomics and Dr John Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Athena Hadjixenofontos
- John P. Hussmann Institute for Human Genomics and Dr John Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Hui Guo
- Center for Biostatistics, Institute of Population Health, The University of Manchester, Manchester, UK
| | - Davide Gentilini
- Unità di Bioinformatica e Statistica Genomica, Istituto Auxologico Italiano-IRCCS, Milano, Italy
| | | | - Pier Paolo Bitti
- Immunoematologia e Medicina Trasfusionale, Ospedale "San Francesco" Nuoro, ASSL Nuoro, Azienda Tutela Salute Sardegna, Nuoro, Italy
| | - Anna Ticca
- Neurologia e Stroke Unit, Ospedale "San Francesco" Nuoro, ASSL Nuoro, Azienda Tutela Salute Sardegna, Nuoro, Italy
| | - Carlo Berzuini
- Center for Biostatistics, Institute of Population Health, The University of Manchester, Manchester, UK
| | - Jacob L McCauley
- John P. Hussmann Institute for Human Genomics and Dr John Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Luisa Bernardinelli
- Department of Brain and Behavioral Science, University of Pavia, Pavia, Italy
| |
Collapse
|
37
|
Martin AR, Karczewski KJ, Kerminen S, Kurki MI, Sarin AP, Artomov M, Eriksson JG, Esko T, Genovese G, Havulinna AS, Kaprio J, Konradi A, Korányi L, Kostareva A, Männikkö M, Metspalu A, Perola M, Prasad RB, Raitakari O, Rotar O, Salomaa V, Groop L, Palotie A, Neale BM, Ripatti S, Pirinen M, Daly MJ. Haplotype Sharing Provides Insights into Fine-Scale Population History and Disease in Finland. Am J Hum Genet 2018; 102:760-775. [PMID: 29706349 DOI: 10.1016/j.ajhg.2018.03.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 02/28/2018] [Indexed: 01/23/2023] Open
Abstract
Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assembled a comprehensive view of recent population history (≤100 generations), the timespan during which most rare-disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to that of 16,060 Swedes, Estonians, Russians, and Hungarians from geographically and linguistically adjacent countries with different population histories. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from more than 25,000 individuals, we find that although haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland typically share several-fold more of their genome in identity-by-descent segments than individuals from southwest regions. We estimate recent effective population-size changes through time across regions of Finland, and we find that there was more continuous gene flow as Finns migrated from southwest to northeast between the early- and late-settlement regions than was dichotomously described previously. Lastly, we show that haplotype sharing is locally enriched by an order of magnitude among pairs of individuals sharing rare alleles and especially among pairs sharing rare disease-causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.
Collapse
Affiliation(s)
- Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Sini Kerminen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland
| | - Mitja I Kurki
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Psychiatric and Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Antti-Pekka Sarin
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; National Institute for Health and Welfare of Finland, Helsinki 00271, Finland
| | - Mykyta Artomov
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Johan G Eriksson
- National Institute for Health and Welfare of Finland, Helsinki 00271, Finland; Folkhälsan Research Center, Helsinki 00290, Finland; Department of General Practice and Primary Health Care, University of Helsinki and Helsinki University Hospital, Helsinki 00014, Finland
| | - Tõnu Esko
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Estonian Genome Center, University of Tartu, Tartu 50090, Estonia
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Aki S Havulinna
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; National Institute for Health and Welfare of Finland, Helsinki 00271, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Department of Public Health, University of Helsinki, Helsinki 00014, Finland
| | - Alexandra Konradi
- Almazov National Medical Research Centre, Saint Petersburg 197341, Russia; National Research University of Information Technologies, Mechanics, and Optics, Saint Petersburg 197101, Russia
| | - László Korányi
- Heart Center Foundation, Drug Research Centre, Balatonfured H-8230, Hungary
| | - Anna Kostareva
- Almazov National Medical Research Centre, Saint Petersburg 197341, Russia; National Research University of Information Technologies, Mechanics, and Optics, Saint Petersburg 197101, Russia
| | - Minna Männikkö
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu 90014, Finland
| | - Andres Metspalu
- Estonian Genome Center, University of Tartu, Tartu 50090, Estonia
| | - Markus Perola
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Estonian Genome Center, University of Tartu, Tartu 50090, Estonia; Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku University Hospital, Turku 20520, Finland
| | - Rashmi B Prasad
- Lund University Diabetes Centre, Department of Clinical Sciences, Lund University CRC, Skåne University Hospital Malmö, SE-205 02, Malmö, Sweden
| | - Olli Raitakari
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku University Hospital, Turku 20520, Finland; Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku 20520, Finland
| | - Oxana Rotar
- Almazov National Medical Research Centre, Saint Petersburg 197341, Russia
| | - Veikko Salomaa
- National Institute for Health and Welfare of Finland, Helsinki 00271, Finland
| | - Leif Groop
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Lund University Diabetes Centre, Department of Clinical Sciences, Lund University CRC, Skåne University Hospital Malmö, SE-205 02, Malmö, Sweden
| | - Aarno Palotie
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Psychiatric and Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Department of Public Health, University of Helsinki, Helsinki 00014, Finland
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland; Department of Public Health, University of Helsinki, Helsinki 00014, Finland; Helsinki Institute for Information Technology and Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland.
| |
Collapse
|
38
|
Evans LM, Tahmasbi R, Jones M, Vrieze SI, Abecasis GR, Das S, Bjelland DW, de Candia TR, Yang J, Goddard ME, Visscher PM, Keller MC. Narrow-sense heritability estimation of complex traits using identity-by-descent information. Heredity (Edinb) 2018; 121:616-630. [PMID: 29588506 DOI: 10.1038/s41437-018-0067-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 01/30/2018] [Accepted: 02/19/2018] [Indexed: 01/10/2023] Open
Abstract
Heritability is a fundamental parameter in genetics. Traditional estimates based on family or twin studies can be biased due to shared environmental or non-additive genetic variance. Alternatively, those based on genotyped or imputed variants typically underestimate narrow-sense heritability contributed by rare or otherwise poorly tagged causal variants. Identical-by-descent (IBD) segments of the genome share all variants between pairs of chromosomes except new mutations that have arisen since the last common ancestor. Therefore, relating phenotypic similarity to degree of IBD sharing among classically unrelated individuals is an appealing approach to estimating the near full additive genetic variance while possibly avoiding biases that can occur when modeling close relatives. We applied an IBD-based approach (GREML-IBD) to estimate heritability in unrelated individuals using phenotypic simulation with thousands of whole-genome sequences across a range of stratification, polygenicity levels, and the minor allele frequencies of causal variants (CVs). In simulations, the IBD-based approach produced unbiased heritability estimates, even when CVs were extremely rare, although precision was low. However, population stratification and non-genetic familial environmental effects shared across generations led to strong biases in IBD-based heritability. We used data on two traits in ~120,000 people from the UK Biobank to demonstrate that, depending on the trait and possible confounding environmental effects, GREML-IBD can be applied to very large genetic datasets to infer the contribution of very rare variants lost using other methods. However, we observed apparent biases in these real data, suggesting that more work may be required to understand and mitigate factors that influence IBD-based heritability estimates.
Collapse
Affiliation(s)
- Luke M Evans
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, 80309, USA.
| | - Rasool Tahmasbi
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, 80309, USA
| | - Matt Jones
- Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, 80309, USA
| | - Scott I Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Gonçalo R Abecasis
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Sayantan Das
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Douglas W Bjelland
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, 80309, USA
| | - Teresa R de Candia
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, 80309, USA
| | - Jian Yang
- Institute for Molecular Bioscience and the Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Michael E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.,Department of Economic Development, Jobs, Transport and Resources, Biosciences Research, Melbourne, VIC, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience and the Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Matthew C Keller
- Institute for Behavioral Genetics, University of Colorado, Boulder, CO, 80309, USA. .,Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, 80309, USA.
| | | |
Collapse
|
39
|
Couto AR, Parreira B, Thomson R, Soares M, Power DM, Stankovich J, Armas JB, Brown MA. Combined approach for finding susceptibility genes in DISH/chondrocalcinosis families: whole-genome-wide linkage and IBS/IBD studies. Hum Genome Var 2017; 4:17041. [PMID: 29104755 PMCID: PMC5666909 DOI: 10.1038/hgv.2017.41] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 07/29/2017] [Indexed: 11/27/2022] Open
Abstract
Twelve families with exuberant and early-onset calcium pyrophosphate dehydrate chondrocalcinosis (CC) and diffuse idiopathic skeletal hyperostosis (DISH), hereafter designated DISH/CC, were identified in Terceira Island, the Azores, Portugal. Ninety-two (92) individuals from these families were selected for whole-genome-wide linkage analysis. An identity-by-descent (IBD) analysis was performed in 10 individuals from 5 of the investigated pedigrees. The chromosome area with the maximal logarithm of the odds score (1.32; P=0.007) was not identified using the IBD/identity-by-state (IBS) analysis; therefore, it was not investigated further. From the IBD/IBS analysis, two candidate genes, LEMD3 and RSPO4, were identified and sequenced. Nine genetic variants were identified in the RSPO4 gene; one regulatory variant (rs146447064) was significantly more frequent in control individuals than in DISH/CC patients (P=0.03). Four variants were identified in LEMD3, and the rs201930700 variant was further investigated using segregation analysis. None of the genetic variants in RSPO4 or LEMD3 segregated within the studied families. Therefore, although a major genetic effect was shown to determine DISH/CC occurrence within these families, the specific genetic variants involved were not identified.
Collapse
Affiliation(s)
- Ana Rita Couto
- Serviço Especializado de Epidemiologia e Biologia Molecular (SEEBMO), Hospital de Santo Espírito da Ilha Terceira (HSEIT), Angra do Heroísmo, Portugal
| | - Bruna Parreira
- Serviço Especializado de Epidemiologia e Biologia Molecular (SEEBMO), Hospital de Santo Espírito da Ilha Terceira (HSEIT), Angra do Heroísmo, Portugal
| | - Russell Thomson
- Center for Research in Mathematics, Western Sydney University, Penrith, Australia
| | - Marta Soares
- Serviço Especializado de Epidemiologia e Biologia Molecular (SEEBMO), Hospital de Santo Espírito da Ilha Terceira (HSEIT), Angra do Heroísmo, Portugal
| | - Deborah M Power
- Center of Marine Sciences (CCMAR), Universidade do Algarve, Faro, Portugal
| | - Jim Stankovich
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Australia
| | - Jácome Bruges Armas
- Serviço Especializado de Epidemiologia e Biologia Molecular (SEEBMO), Hospital de Santo Espírito da Ilha Terceira (HSEIT), Angra do Heroísmo, Portugal.,CEDOC-Chronic Diseases Research Center, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Matthew A Brown
- Translational Genomics Group, Institute of Health and Biomedical Innovation, Translational Research Institute, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
40
|
Hellwege J, Keaton J, Giri A, Gao X, Velez Edwards DR, Edwards TL. Population Stratification in Genetic Association Studies. CURRENT PROTOCOLS IN HUMAN GENETICS 2017; 95:1.22.1-1.22.23. [PMID: 29044472 PMCID: PMC6007879 DOI: 10.1002/cphg.48] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Population stratification (PS) is a primary consideration in studies of genetic determinants of human traits. Failure to control for PS may lead to confounding, causing a study to fail for lack of significant results, or resources to be wasted following false-positive signals. Here, historical and current approaches for addressing PS when performing genetic association studies in human populations are reviewed. Methods for detecting the presence of PS, including global and local ancestry methods, are described. Also described are approaches for accounting for PS when calculating association statistics, such that measures of association are not confounded. Many traits are being examined for the first time in minority populations, which may inherently feature PS. © 2017 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Jacklyn Hellwege
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Jacob Keaton
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Ayush Giri
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Xiaoyi Gao
- Department of Ophthalmology and Preventive Medicine, Keck School of Medicine, University of Southern California, Los
Angeles, CA 90033, USA
| | - Digna R. Velez Edwards
- Vanderbilt Genetics Institute, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| | - Todd L. Edwards
- Vanderbilt Genetics Institute, Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center,
Nashville, TN 37203, USA
| |
Collapse
|
41
|
Belbin GM, Odgis J, Sorokin EP, Yee MC, Kohli S, Glicksberg BS, Gignoux CR, Wojcik GL, Van Vleck T, Jeff JM, Linderman M, Schurmann C, Ruderfer D, Cai X, Merkelson A, Justice AE, Young KL, Graff M, North KE, Peters U, James R, Hindorff L, Kornreich R, Edelmann L, Gottesman O, Stahl EE, Cho JH, Loos RJ, Bottinger EP, Nadkarni GN, Abul-Husn NS, Kenny EE. Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system. eLife 2017; 6:25060. [PMID: 28895531 PMCID: PMC5595434 DOI: 10.7554/elife.25060] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 08/09/2017] [Indexed: 11/16/2022] Open
Abstract
Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease. Diseases often run in families. These disease are frequently linked to changes in DNA that are passed down through generations. Close family members may share these disease-causing mutations; so may distant relatives who inherited the same mutation from a common ancestor long ago. Geneticists use a method called linkage mapping to trace a disease found in multiple members of a family over generations to genetic changes in a shared ancestor. This allows scientists to pinpoint the exact place in the genome the disease-causing mutation occurred. Using computer algorithms, scientists can apply the same technique to identify mutations that distant relatives inherited from a common ancestor. Belbin et al. used this computational technique to identify a mutation that may cause unusually short stature or bone and joint problems in up to 2% of people of Puerto Rican descent. In the experiments, the genomes of about 32,000 New Yorkers who have volunteered to participate in the BioMe Biobank and their health records were used to search for genetic changes linked to extremely short stature. The search revealed that people who inherited two copies of this mutation from their parents were likely to be extremely short or to have bone and joint problems. People who inherited one copy had an increased likelihood of joint or bone problems. This mutation affects a gene responsible for making a form of protein called collagen that is important for bone growth. The analysis suggests the mutation first arose in a Native American ancestor living in Puerto Rico around the time that European colonization began. The mutation had previously been linked to a disorder called Steel syndrome that was thought to be rare. Belbin et al. showed this condition is actually fairly common in people whose ancestors recently came from Puerto Rico, but may often go undiagnosed by their physicians. The experiments emphasize the importance of including diverse populations in genetic studies, as studies of people of predominantly European descent would likely have missed the link between this disease and mutation.
Collapse
Affiliation(s)
- Gillian Morven Belbin
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Jacqueline Odgis
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Elena P Sorokin
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Muh-Ching Yee
- Department of Plant Biology, Carnegie Institution for Science, Stanford, United States
| | - Sumita Kohli
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Benjamin S Glicksberg
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Harris Center for Precision Wellness, Icahn School of Medicine at Mt Sinai, New York, United States
| | - Christopher R Gignoux
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Genevieve L Wojcik
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Tielman Van Vleck
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Janina M Jeff
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Michael Linderman
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Claudia Schurmann
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Douglas Ruderfer
- Broad Institute, Cambridge, United States.,Division of Psychiatric Genomics, Icahn School of Medicine at Mt Sinai, New York, United States.,Center for Statistical Genetics, Icahn School of Medicine at Mt Sinai, New York, United States
| | - Xiaoqiang Cai
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Amanda Merkelson
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Anne E Justice
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Kristin L Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Misa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, United States.,Department of Epidemiology, University of Washington School of Public Health, Seattle, United States
| | - Regina James
- National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, United States
| | - Lucia Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, United States
| | - Ruth Kornreich
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Lisa Edelmann
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Eli Ea Stahl
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Harris Center for Precision Wellness, Icahn School of Medicine at Mt Sinai, New York, United States.,Broad Institute, Cambridge, United States
| | - Judy H Cho
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Ruth Jf Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Erwin P Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Girish N Nadkarni
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Noura S Abul-Husn
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Eimear E Kenny
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, United States.,The Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,Center for Statistical Genetics, Icahn School of Medicine at Mt Sinai, New York, United States
| |
Collapse
|
42
|
Exploring Identity-By-Descent Segments and Putative Functions Using Different Foundation Parents in Maize. PLoS One 2016; 11:e0168374. [PMID: 27997600 PMCID: PMC5172581 DOI: 10.1371/journal.pone.0168374] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 11/29/2016] [Indexed: 02/06/2023] Open
Abstract
Maize foundation parents (FPs) play no-alternative roles in hybrid breeding because they were widely used in the development of new lines and hybrids. The combination of different identity-by-descent (IBD) segments and genes could account for the formation patterns of different FPs, and knowledge of these IBD regions would provide an extensive foundation for the development of new candidate FP lines in future maize breeding. In this paper, a panel of 304 elite lines derived from FPs, i.e., B73, 207, Mo17, and Huangzaosi (HZS), was collected and analyzed using 43,252 single nucleotide polymorphism (SNP) markers. Most IBD segments specific to particular FP groups were identified, including 116 IBD segments in B73, 105 in Mo17, 111 in 207, and 190 in HZS. In these regions, 423 quantitative trait nucleotides (QTNs) associated with 15 agronomic traits and 804 candidate genes were identified. Some known adaptation-related genes, e.g., dwarf8 and vgt1 in HZS, zcn8 and epc in Mo17, and ZmCCT in 207, were validated as being tightly linked to particular IBD segments. In addition, numerous new candidate genes were also identified. For example, GRMZM2G154278 in HZS, which belongs to the cell cycle control family, was closely linked to a QTN of the ear height/plant height (EH/PH) trait; GRMZM2G051943 in 207, which encodes an endochitinase precursor (EP) chitinase, was closely linked to a QTN for kernel density; and GRMZM2G170586 in Mo17 was closely linked to a QTN for ear diameter. Complex correlations among these genes were also found. Many IBD segments and genes were included in the formation of FP lines, and complex regulatory networks exist among them. These results provide new insights on the genetic basis of complex traits and provide new candidate IBD regions or genes for the improvement of special traits in maize production.
Collapse
|
43
|
Liu XQ, Fazio J, Hu P, Paterson AD. Identity-by-descent mapping for diastolic blood pressure in unrelated Mexican Americans. BMC Proc 2016; 10:263-267. [PMID: 27980647 PMCID: PMC5133517 DOI: 10.1186/s12919-016-0041-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Population-based identity by descent (IBD) mapping is a statistical method for detection of genetic loci that share an ancestral segment among “unrelated” pairs of individuals for a disease. As a complementary method to genome-wide association studies, IBD mapping is robust to allelic heterogeneity and may identify rare inherited variants when combined with sequence data. Our objective is to identify the causal genes for diastolic blood pressure (DBP). We applied a population-based IBD mapping method to 105 unrelated individuals selected from the family data provided for the Genetic Analysis Workshop 19. Using the genome-wide association study data (ie, the microarray data), chromosome 3 was scanned for IBD sharing segments among all pairs of these individuals. At the chromosomal region with the most significant relationship between IBD sharing and DBP, the whole genome sequence data were examined to identify the risk variants for DBP. The most significant chromosomal region that was identified to have a relationship between the IBD sharing and DBP was at 3q12.3 (p = 0.0016), although it did not achieve the chromosome-wide significance level (p = 0.00012). This chromosomal region contains 1 gene, ZPLD1, which has been reported to be associated with cerebral cavernous malformations, a disease with enlarged small blood vessels (capillaries) in the brain. Although 24 deleterious variants were identified at this region, no significant association was found between these variants and DBP (p = 0.40). We presented a mapping strategy which combined a population-based IBD mapping method with sequence data analyses. One gene was located at a chromosomal region identified by this method for DBP. However, further study with a large sample size is needed to assess this result.
Collapse
Affiliation(s)
- Xiao-Qing Liu
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; The Children's Hospital Research Institute of Manitoba, Winnipeg, MB R3E 3P4 Canada
| | - Jillian Fazio
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Manitoba, Winnipeg, MB R3E 3P4 Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; George and Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg, MB R3A 1R9 Canada
| | - Andrew D Paterson
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4 Canada ; Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5G 0A4 Canada
| |
Collapse
|
44
|
Staples J, Witherspoon D, Jorde L, Nickerson D, Below J, Huff C, Huff CD. PADRE: Pedigree-Aware Distant-Relationship Estimation. Am J Hum Genet 2016; 99:154-62. [PMID: 27374771 DOI: 10.1016/j.ajhg.2016.05.020] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 05/16/2016] [Indexed: 10/21/2022] Open
Abstract
Accurate estimation of shared ancestry is an important component of many genetic studies; current prediction tools accurately estimate pairwise genetic relationships up to the ninth degree. Pedigree-aware distant-relationship estimation (PADRE) combines relationship likelihoods generated by estimation of recent shared ancestry (ERSA) with likelihoods from family networks reconstructed by pedigree reconstruction and identification of a maximum unrelated set (PRIMUS), improving the power to detect distant relationships between pedigrees. Using PADRE, we estimated relationships from simulated pedigrees and three extended pedigrees, correctly predicting 20% more fourth- through ninth-degree simulated relationships than when using ERSA alone. By leveraging pedigree information, PADRE can even identify genealogical relationships between individuals who are genetically unrelated. For example, although 95% of 13(th)-degree relatives are genetically unrelated, in simulations, PADRE correctly predicted 50% of 13(th)-degree relationships to within one degree of relatedness. The improvement in prediction accuracy was consistent between simulated and actual pedigrees. We also applied PADRE to the HapMap3 CEU samples and report new cryptic relationships and validation of previously described relationships between families. PADRE greatly expands the range of relationships that can be estimated by using genetic data in pedigrees.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Chad D Huff
- Department of Epidemiology, The University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA.
| |
Collapse
|
45
|
Schrodi SJ. Reflections on the Field of Human Genetics: A Call for Increased Disease Genetics Theory. Front Genet 2016; 7:106. [PMID: 27375680 PMCID: PMC4896932 DOI: 10.3389/fgene.2016.00106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 05/25/2016] [Indexed: 12/29/2022] Open
Abstract
Development of human genetics theoretical models and the integration of those models with experiment and statistical evaluation are critical for scientific progress. This perspective argues that increased effort in disease genetics theory, complementing experimental, and statistical efforts, will escalate the unraveling of molecular etiologies of complex diseases. In particular, the development of new, realistic disease genetics models will help elucidate complex disease pathogenesis, and the predicted patterns in genetic data made by these models will enable the concurrent, more comprehensive statistical testing of multiple aspects of disease genetics predictions, thereby better identifying disease loci. By theoretical human genetics, I intend to encompass all investigations devoted to modeling the heritable architecture underlying disease traits and studies of the resulting principles and dynamics of such models. Hence, the scope of theoretical disease genetics work includes construction and analysis of models describing how disease-predisposing alleles (1) arise, (2) are transmitted across families and populations, and (3) interact with other risk and protective alleles across both the genome and environmental factors to produce disease states. Theoretical work improves insight into viable genetic models of diseases consistent with empirical results from linkage, transmission, and association studies as well as population genetics. Furthermore, understanding the patterns of genetic data expected under realistic disease models will enable more powerful approaches to discover disease-predisposing alleles and additional heritable factors important in common diseases. In spite of the pivotal role of disease genetics theory, such investigation is not particularly vibrant.
Collapse
Affiliation(s)
- Steven J Schrodi
- Marshfield Clinic Research Foundation, Center for Human GeneticsMarshfield, WI, USA; Computation and Informatics in Biology and Medicine, University of Wisconsin-MadisonMadison, WI, USA
| |
Collapse
|
46
|
Baharian S, Barakatt M, Gignoux CR, Shringarpure S, Errington J, Blot WJ, Bustamante CD, Kenny EE, Williams SM, Aldrich MC, Gravel S. The Great Migration and African-American Genomic Diversity. PLoS Genet 2016; 12:e1006059. [PMID: 27232753 PMCID: PMC4883799 DOI: 10.1371/journal.pgen.1006059] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/26/2016] [Indexed: 12/23/2022] Open
Abstract
We present a comprehensive assessment of genomic diversity in the African-American population by studying three genotyped cohorts comprising 3,726 African-Americans from across the United States that provide a representative description of the population across all US states and socioeconomic status. An estimated 82.1% of ancestors to African-Americans lived in Africa prior to the advent of transatlantic travel, 16.7% in Europe, and 1.2% in the Americas, with increased African ancestry in the southern United States compared to the North and West. Combining demographic models of ancestry and those of relatedness suggests that admixture occurred predominantly in the South prior to the Civil War and that ancestry-biased migration is responsible for regional differences in ancestry. We find that recent migrations also caused a strong increase in genetic relatedness among geographically distant African-Americans. Long-range relatedness among African-Americans and between African-Americans and European-Americans thus track north- and west-bound migration routes followed during the Great Migration of the twentieth century. By contrast, short-range relatedness patterns suggest comparable mobility of ∼15–16km per generation for African-Americans and European-Americans, as estimated using a novel analytical model of isolation-by-distance. Genetic studies of African-Americans identify functional variants, elucidate historical and genealogical mysteries, and reveal basic biology. However, African-Americans have been under-represented in genetic studies, and relatively little is known about nation-wide patterns of genomic diversity in the population. Here, we study African-American genomic diversity using genotype data from nationally and regionally representative cohorts. Access to these unique cohorts allows us to clarify the role of population structure, admixture, and recent massive migrations in shaping African-American genomic diversity and sheds new light on the genetic history of this population.
Collapse
Affiliation(s)
- Soheil Baharian
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Maxime Barakatt
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- School of Computer Science, McGill University, Montreal, Quebec, Canada
| | - Christopher R. Gignoux
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Suyash Shringarpure
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jacob Errington
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - William J. Blot
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- International Epidemiology Institute, Rockville, Maryland, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Eimear E. Kenny
- Department of Genetics and Genomic Sciences, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Charles Bronfman Institute for Personalized Medicine, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Icahn Institute for Genomics and Multiscale Biology, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
- The Center for Statistical Genetics, The Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Scott M. Williams
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
| | - Melinda C. Aldrich
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
47
|
Friedrichs S, Malzahn D, Pugh EW, Almeida M, Liu XQ, Bailey JN. Filtering genetic variants and placing informative priors based on putative biological function. BMC Genet 2016; 17 Suppl 2:8. [PMID: 26866982 PMCID: PMC4895695 DOI: 10.1186/s12863-015-0313-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.
Collapse
Affiliation(s)
- Stefanie Friedrichs
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Göttingen, Germany.
| | - Dörthe Malzahn
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Göttingen, Germany.
| | - Elizabeth W Pugh
- Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | - Marcio Almeida
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX, USA.
| | - Xiao Qing Liu
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Department of Biochemistry and Medical Genetics, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
- Children's Hospital Research Institute of Manitoba, Winnipeg, MB, Canada.
| | - Julia N Bailey
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, USA.
- Epilepsy Genetics/Genomics Laboratory, West Los Angeles Veterans Administration, Los Angeles, CA, USA.
| |
Collapse
|
48
|
Affiliation(s)
- Sarah Tishkoff
- Departments of Genetics and Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
49
|
Cunha MLR, Meijers JCM, Middeldorp S. Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism. Thromb Haemost 2015; 114:920-32. [PMID: 26446408 DOI: 10.1160/th15-05-0411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/26/2015] [Indexed: 12/13/2022]
Abstract
Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.
Collapse
Affiliation(s)
- Marisa L R Cunha
- Marisa L. R. Cunha, Department of Experimental Vascular Medicine, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel.: +31 20 5662824, Fax: +31 20 6968833, E-mail:
| | | | | |
Collapse
|
50
|
Binzer S, Stenager E, Binzer M, Kyvik KO, Hillert J, Imrell K. Genetic analysis of the isolated Faroe Islands reveals SORCS3 as a potential multiple sclerosis risk gene. Mult Scler 2015; 22:733-40. [DOI: 10.1177/1352458515602338] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 07/21/2015] [Indexed: 11/15/2022]
Abstract
Background: In search of the missing heritability in multiple sclerosis (MS), additional approaches adding to the genetic discoveries of large genome-wide association studies are warranted. Objective: The objective of this research paper is to search for rare genetic MS risk variants in the genetically homogenous population of the isolated Faroe Islands. Methods: Twenty-nine Faroese MS cases and 28 controls were genotyped with the HumanOmniExpressExome-chip. The individuals make up 1596 pair-combinations in which we searched for identical-by-descent shared segments using the PLINK-program. Results: A segment spanning 63 SNPs with excess case-case-pair sharing was identified (0.00173 < p > 0.00212). A haplotype consisting of 42 of the 63 identified SNPs which spanned the entire the Sortilin-related vacuolar protein sorting 10 domain containing receptor 3 ( SORCS3) gene had a carrier frequency of 0.34 in cases but was not present in any controls ( p = 0.0008). Conclusion: This study revealed an oversharing in case-case-pairs of a segment spanning 63 SNPs and the entire SORCS3. While not previously associated with MS, SORCS3 appears to be important in neuronal plasticity through its binding of neurotrophin factors and involvement in glutamate homeostasis. Although additional work is needed to scrutinise the genetic effect of the SORCS3-covering haplotype, this study suggests that SORCS3 may also be important in MS pathogenesis.
Collapse
Affiliation(s)
- S Binzer
- Institute of Regional Health Research, University of Southern Denmark, Denmark/Hospital of Southern Jutland, Denmark/Odense Patient data Explorative Network (OPEN), University of Southern Denmark, Denmark/ Torshavn National Hospital, Faroe Islands
| | - E Stenager
- Institute of Regional Health Research, University of Southern Denmark, Denmark/Hospital of Southern Jutland, Denmark/ MS Clinic of Southern Jutland (Sønderborg, Esbjerg, Vejle), Department of Neurology, Denmark
| | - M Binzer
- Institute of Regional Health Research, University of Southern Denmark, Denmark
| | - KO Kyvik
- Department of Clinical Research, University of Southern Denmark, Denmark/Odense Patient data Explorative Network (OPEN), University of Southern Denmark, Denmark
| | - J Hillert
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| | - K Imrell
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|