1
|
Markello C, Huang C, Rodriguez A, Carroll A, Chang PC, Eizenga J, Markello T, Haussler D, Paten B. A complete pedigree-based graph workflow for rare candidate variant analysis. Genome Res 2022; 32:893-903. [PMID: 35483961 PMCID: PMC9104704 DOI: 10.1101/gr.276387.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/24/2022] [Indexed: 11/24/2022]
Abstract
Methods that use a linear genome reference for genome sequencing data analysis are reference-biased. In the field of clinical genetics for rare diseases, a resulting reduction in genotyping accuracy in some regions has likely prevented the resolution of some cases. Pangenome graphs embed population variation into a reference structure. Although pangenome graphs have helped to reduce reference mapping bias, further performance improvements are possible. We introduce VG-Pedigree, a pedigree-aware workflow based on the pangenome-mapping tool of Giraffe and the variant calling tool DeepTrio using a specially trained model for Giraffe-based alignments. We demonstrate mapping and variant calling improvements in both single-nucleotide variants (SNVs) and insertion and deletion (indel) variants over those produced by alignments created using BWA-MEM to a linear-reference and Giraffe mapping to a pangenome graph containing data from the 1000 Genomes Project. We have also adapted and upgraded deleterious-variant (DV) detecting methods and programs into a streamlined workflow. We used these workflows in combination to detect small lists of candidate DVs among 15 family quartets and quintets of the Undiagnosed Diseases Program (UDP). All candidate DVs that were previously diagnosed using the Mendelian models covered by the previously published methods were recapitulated by these workflows. The results of these experiments indicate that a slightly greater absolute count of DVs are detected in the proband population than in their matched unaffected siblings.
Collapse
Affiliation(s)
- Charles Markello
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
| | - Charles Huang
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Alex Rodriguez
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Andrew Carroll
- Google Incorporated, Mountain View, California 94043, USA
| | - Pi-Chuan Chang
- Google Incorporated, Mountain View, California 94043, USA
| | - Jordan Eizenga
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
| | - Thomas Markello
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Santa Cruz, California 95060, USA
| |
Collapse
|
2
|
Weiss K, Hiekkalinna T. Life is a simulation of life - or is it?: What we observe is just one run of a probabilistic process. Evol Anthropol 2017; 26:151-156. [PMID: 28815960 DOI: 10.1002/evan.21522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2017] [Indexed: 11/09/2022]
Affiliation(s)
- Kenneth Weiss
- Department of Anthropology and Genetics, Penn State University
| | - Tero Hiekkalinna
- Genomics and Biomarkers Unit, National Institute for Health and Welfare (THL), Helsinki, Finland
| |
Collapse
|
3
|
Valcarcel A, Grinde K, Cook K, Green A, Tintle N. A multistep approach to single nucleotide polymorphism-set analysis: an evaluation of power and type I error of gene-based tests of association after pathway-based association tests. BMC Proc 2016; 10:349-355. [PMID: 27980661 PMCID: PMC5133510 DOI: 10.1186/s12919-016-0055-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The aggregation of functionally associated variants given a priori biological information can aid in the discovery of rare variants associated with complex diseases. Many methods exist that aggregate rare variants into a set and compute a single p value summarizing association between the set of rare variants and a phenotype of interest. These methods are often called gene-based, rare variant tests of association because the variants in the set are often all contained within the same gene. A reasonable extension of these approaches involves aggregating variants across an even larger set of variants (eg, all variants contained in genes within a pathway). Testing sets of variants such as pathways for association with a disease phenotype reduces multiple testing penalties, may increase power, and allows for straightforward biological interpretation. However, a significant variant-set association test does not indicate precisely which variants contained within that set are causal. Because pathways often contain many variants, it may be helpful to follow-up significant pathway tests by conducting gene-based tests on each gene in that pathway to narrow in on the region of causal variants. In this paper, we propose such a multistep approach for variant-set analysis that can also account for covariates and complex pedigree structure. We demonstrate this approach on simulated phenotypes from Genetic Analysis Workshop 19. We find generally better power for the multistep approach when compared to a more conventional, single-step approach that simply runs gene-based tests of association on each gene across the genome. Further work is necessary to evaluate the multistep approach on different data sets with different characteristics.
Collapse
Affiliation(s)
- Alessandra Valcarcel
- Department of Statistics, University of Connecticut, 2390 Alumni Drive, Storrs, CT 06269 USA
| | - Kelsey Grinde
- Department of Biostatistics, University of Washington, NE Pacific St, Seattle, WA 98195 USA
| | - Kaitlyn Cook
- Department of Mathematics and Statistics, Carleton College, 1 N College St, Northfield, MN 55057 USA
| | - Alden Green
- Department of Statistics, Harvard University, Massachusetts Hall, Cambridge, MA 02138 USA
| | - Nathan Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College, 498 4th Ave. NE, Dordt College, Sioux Center, IA 51250 USA
| |
Collapse
|
4
|
Lee S, Choi S, Kim YJ, Kim BJ, Hwang H, Park T. Pathway-based approach using hierarchical components of collapsed rare variants. Bioinformatics 2016; 32:i586-i594. [PMID: 27587678 PMCID: PMC5013912 DOI: 10.1093/bioinformatics/btw425] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION To address 'missing heritability' issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem. RESULTS Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study. AVAILABILITY AND IMPLEMENTATION An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/ CONTACT tspark@stats.snu.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sungyoung Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea
| | - Sungkyoung Choi
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea
| | - Young Jin Kim
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Chungcheongbuk-Do 363-951, Korea
| | - Bong-Jo Kim
- Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Chungcheongbuk-Do 363-951, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea Department of Statistics, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
5
|
Lin KH, Zöllner S. Robust and Powerful Affected Sibpair Test for Rare Variant Association. Genet Epidemiol 2015; 39:325-33. [PMID: 25966809 DOI: 10.1002/gepi.21903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 03/25/2015] [Accepted: 04/01/2015] [Indexed: 11/09/2022]
Abstract
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case-control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family-based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family-based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case-control study for variants with summed risk allele frequency f < 0.05; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene-gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.
Collapse
Affiliation(s)
- Keng-Han Lin
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.,Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
6
|
Guo W, Shugart YY. The power comparison of the haplotype-based collapsing tests and the variant-based collapsing tests for detecting rare variants in pedigrees. BMC Genomics 2014; 15:632. [PMID: 25070353 PMCID: PMC4131059 DOI: 10.1186/1471-2164-15-632] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 07/18/2014] [Indexed: 11/20/2022] Open
Abstract
Background Both common and rare genetic variants have been shown to contribute to the etiology of complex diseases. Recent genome-wide association studies (GWAS) have successfully investigated how common variants contribute to the genetic factors associated with common human diseases. However, understanding the impact of rare variants, which are abundant in the human population (one in every 17 bases), remains challenging. A number of statistical tests have been developed to analyze collapsed rare variants identified by association tests. Here, we propose a haplotype-based approach. This work inspired by an existing statistical framework of the pedigree disequilibrium test (PDT), which uses genetic data to assess the effects of variants in general pedigrees. We aim to compare the performance between the haplotype-based approach and the rare variant-based approach for detecting rare causal variants in pedigrees. Results Extensive simulations in the sequencing setting were carried out to evaluate and compare the haplotype-based approach with the rare variant methods that drew on a more conventional collapsing strategy. As assessed through a variety of scenarios, the haplotype-based pedigree tests had enhanced statistical power compared with the rare variants based pedigree tests when the disease of interest was mainly caused by rare haplotypes (with multiple rare alleles), and vice versa when disease was caused by rare variants acting independently. For most of other situations when disease was caused both by haplotypes with multiple rare alleles and by rare variants with similar effects, these two approaches provided similar power in testing for association. Conclusions The haplotype-based approach was designed to assess the role of rare and potentially causal haplotypes. The proposed rare variants-based pedigree tests were designed to assess the role of rare and potentially causal variants. This study clearly documented the situations under which either method performs better than the other. All tests have been implemented in a software, which was submitted to the Comprehensive R Archive Network (CRAN) for general use as a computer program named rvHPDT.
Collapse
Affiliation(s)
| | - Yin Yao Shugart
- Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, 35 Convent Drive, Bethesda, MD 20892, USA.
| |
Collapse
|
7
|
Amish revisited: next-generation sequencing studies of psychiatric disorders among the Plain people. Trends Genet 2013; 29:412-8. [PMID: 23422049 DOI: 10.1016/j.tig.2013.01.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Revised: 01/08/2013] [Accepted: 01/22/2013] [Indexed: 11/23/2022]
Abstract
The rapid development of next-generation sequencing (NGS) technology has led to renewed interest in the potential contribution of rarer forms of genetic variation to complex non-mendelian phenotypes such as psychiatric illnesses. Although challenging, family-based studies offer some advantages, especially in communities with large families and a limited number of founders. Here we revisit family-based studies of mental illnesses in traditional Amish and Mennonite communities--known collectively as the Plain people. We discuss the new opportunities for NGS in these populations, with particular emphasis on investigating psychiatric disorders. We also address some of the challenges facing NGS-based studies of complex phenotypes in founder populations.
Collapse
|