26
|
Jia X, Goes FS, Locke AE, Palmer D, Wang W, Cohen-Woods S, Genovese G, Jackson AU, Jiang C, Kvale M, Mullins N, Nguyen H, Pirooznia M, Rivera M, Ruderfer DM, Shen L, Thai K, Zawistowski M, Zhuang Y, Abecasis G, Akil H, Bergen S, Burmeister M, Chapman S, DelaBastide M, Juréus A, Kang HM, Kwok PY, Li JZ, Levy SE, Monson ET, Moran J, Sobell J, Watson S, Willour V, Zöllner S, Adolfsson R, Blackwood D, Boehnke M, Breen G, Corvin A, Craddock N, DiFlorio A, Hultman CM, Landen M, Lewis C, McCarroll SA, Richard McCombie W, McGuffin P, McIntosh A, McQuillin A, Morris D, Myers RM, O'Donovan M, Ophoff R, Boks M, Kahn R, Ouwehand W, Owen M, Pato C, Pato M, Posthuma D, Potash JB, Reif A, Sklar P, Smoller J, Sullivan PF, Vincent J, Walters J, Neale B, Purcell S, Risch N, Schaefer C, Stahl EA, Zandi PP, Scott LJ. Investigating rare pathogenic/likely pathogenic exonic variation in bipolar disorder. Mol Psychiatry 2021; 26:5239-5250. [PMID: 33483695 PMCID: PMC8295400 DOI: 10.1038/s41380-020-01006-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 12/14/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Bipolar disorder (BD) is a serious mental illness with substantial common variant heritability. However, the role of rare coding variation in BD is not well established. We examined the protein-coding (exonic) sequences of 3,987 unrelated individuals with BD and 5,322 controls of predominantly European ancestry across four cohorts from the Bipolar Sequencing Consortium (BSC). We assessed the burden of rare, protein-altering, single nucleotide variants classified as pathogenic or likely pathogenic (P-LP) both exome-wide and within several groups of genes with phenotypic or biologic plausibility in BD. While we observed an increased burden of rare coding P-LP variants within 165 genes identified as BD GWAS regions in 3,987 BD cases (meta-analysis OR = 1.9, 95% CI = 1.3-2.8, one-sided p = 6.0 × 10-4), this enrichment did not replicate in an additional 9,929 BD cases and 14,018 controls (OR = 0.9, one-side p = 0.70). Although BD shares common variant heritability with schizophrenia, in the BSC sample we did not observe a significant enrichment of P-LP variants in SCZ GWAS genes, in two classes of neuronal synaptic genes (RBFOX2 and FMRP) associated with SCZ or in loss-of-function intolerant genes. In this study, the largest analysis of exonic variation in BD, individuals with BD do not carry a replicable enrichment of rare P-LP variants across the exome or in any of several groups of genes with biologic plausibility. Moreover, despite a strong shared susceptibility between BD and SCZ through common genetic variation, we do not observe an association between BD risk and rare P-LP coding variants in genes known to modulate risk for SCZ.
Collapse
|
27
|
Si Y, Vanderwerff B, Zöllner S. Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms. Genetics 2021; 217:iyab011. [PMID: 33686438 PMCID: PMC8049559 DOI: 10.1093/genetics/iyab011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 12/15/2020] [Indexed: 01/13/2023] Open
Abstract
Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency < 1% without sequencing. Although it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms include model misspecifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model misspecification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model misspecification has a significant impact on the r2 of rare variants. To evaluate these predictions in practice, we compare the imputation of the same dataset across imputation panels of different sizes. Although this empirical imputation accuracy is substantially lower than our theoretical prediction, modeling misspecification seems to further decrease imputation accuracy for variants with low allele counts in the reference. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.
Collapse
|
28
|
Dutta D, VandeHaar P, Fritsche LG, Zöllner S, Boehnke M, Scott LJ, Lee S. A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank. Am J Hum Genet 2021; 108:669-681. [PMID: 33730541 DOI: 10.1016/j.ajhg.2021.02.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 02/19/2021] [Indexed: 02/06/2023] Open
Abstract
Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.
Collapse
|
29
|
Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee SB, Tian X, Browning BL, Das S, Emde AK, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen YDI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin KH, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O'Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo JS, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Van Den Berg DJ, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng LC, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Papanicolaou GJ, Nickerson DA, Browning SR, Zody MC, Zöllner S, Wilson JG, Cupples LA, Laurie CC, Jaquish CE, Hernandez RD, O'Connor TD, Abecasis GR. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021; 590:290-299. [PMID: 33568819 PMCID: PMC7875770 DOI: 10.1038/s41586-021-03205-y] [Citation(s) in RCA: 860] [Impact Index Per Article: 286.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 01/07/2021] [Indexed: 02/08/2023]
Abstract
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Collapse
|
30
|
Cochran AL, Nieser KJ, Forger DB, Zöllner S, McInnis MG. Gene-set Enrichment with Mathematical Biology (GEMB). Gigascience 2020; 9:giaa091. [PMID: 33034635 PMCID: PMC7546080 DOI: 10.1093/gigascience/giaa091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 06/01/2020] [Accepted: 08/14/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Gene-set analyses measure the association between a disease of interest and a "set" of genes related to a biological pathway. These analyses often incorporate gene network properties to account for differential contributions of each gene. We extend this concept further-defining gene contributions based on biophysical properties-by leveraging mathematical models of biology to predict the effects of genetic perturbations on a particular downstream function. RESULTS We present a method that combines gene weights from model predictions and gene ranks from genome-wide association studies into a weighted gene-set test. We demonstrate in simulation how such a method can improve statistical power. To this effect, we identify a gene set, weighted by model-predicted contributions to intracellular calcium ion concentration, that is significantly related to bipolar disorder in a small dataset (P = 0.04; n = 544). We reproduce this finding using publicly available summary data from the Psychiatric Genomics Consortium (P = 1.7 × 10-4; n = 41,653). By contrast, an approach using a general calcium signaling pathway did not detect a significant association with bipolar disorder (P = 0.08). The weighted gene-set approach based on intracellular calcium ion concentration did not detect a significant relationship with schizophrenia (P = 0.09; n = 65,967) or major depression disorder (P = 0.30; n = 500,199). CONCLUSIONS Together, these findings show how incorporating math biology into gene-set analyses might help to identify biological functions that underlie certain polygenic disorders.
Collapse
|
31
|
Kessler MD, Loesch DP, Perry JA, Heard-Costa NL, Taliun D, Cade BE, Wang H, Daya M, Ziniti J, Datta S, Celedón JC, Soto-Quiros ME, Avila L, Weiss ST, Barnes K, Redline SS, Vasan RS, Johnson AD, Mathias RA, Hernandez R, Wilson JG, Nickerson DA, Abecasis G, Browning SR, Zöllner S, O'Connell JR, Mitchell BD, O'Connor TD. De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc Natl Acad Sci U S A 2020; 117:2560-2569. [PMID: 31964835 PMCID: PMC7007577 DOI: 10.1073/pnas.1902766117] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.
Collapse
|
32
|
Narisu N, Rothwell R, Vrtačnik P, Rodríguez S, Didion J, Zöllner S, Erdos MR, Collins FS, Eriksson M. Analysis of somatic mutations identifies signs of selection during in vitro aging of primary dermal fibroblasts. Aging Cell 2019; 18:e13010. [PMID: 31385397 PMCID: PMC6826141 DOI: 10.1111/acel.13010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 06/20/2019] [Accepted: 06/30/2019] [Indexed: 12/13/2022] Open
Abstract
Somatic mutations are critical for cancer development and may play a role in age-related functional decline. Here, we used deep sequencing to analyze the prevalence of somatic mutations during in vitro cell aging. Primary dermal fibroblasts from healthy subjects of young and advanced age, from Hutchinson-Gilford progeria syndrome and from xeroderma pigmentosum complementation groups A and C, were first restricted in number and then expanded in vitro. DNA was obtained from cells pre- and post-expansion and sequenced at high depth (1656× mean coverage), over a cumulative 290 kb target region, including the exons of 44 aging-related genes. Allele frequencies of 58 somatic mutations differed between the pre- and post-cell culture expansion passages. Mathematical modeling revealed that the frequency change of three of the 58 mutations was unlikely to be explained by genetic drift alone, indicative of positive selection. Two of these three mutations, CDKN2A c.53C>T (T18M) and ERCC8 c.*772T>A, were identified in cells from a patient with XPA. The allele frequency of the CDKN2A mutation increased from 0% to 55.3% with increasing cell culture passage. The third mutation, BRCA2 c.6222C>T (H2074H), was identified in a sample from a healthy individual of advanced age. However, further validation of the three mutations suggests that other unmeasured variants probably provide the selective advantage in these cells. Our results reinforce the notions that somatic mutations occur during aging and that some are under positive selection, supporting the model of increased tissue heterogeneity with increased age.
Collapse
|
33
|
Kowalski MH, Qian H, Hou Z, Rosen JD, Tapia AL, Shan Y, Jain D, Argos M, Arnett DK, Avery C, Barnes KC, Becker LC, Bien SA, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Buyske S, Cai J, Cho MH, Choi SH, Choquet H, Cupples LA, Cushman M, Daya M, de Vries PS, Ellinor PT, Faraday N, Fornage M, Gabriel S, Ganesh SK, Graff M, Gupta N, He J, Heckbert SR, Hidalgo B, Hodonsky CJ, Irvin MR, Johnson AD, Jorgenson E, Kaplan R, Kardia SLR, Kelly TN, Kooperberg C, Lasky-Su JA, Loos RJF, Lubitz SA, Mathias RA, McHugh CP, Montgomery C, Moon JY, Morrison AC, Palmer ND, Pankratz N, Papanicolaou GJ, Peralta JM, Peyser PA, Rich SS, Rotter JI, Silverman EK, Smith JA, Smith NL, Taylor KD, Thornton TA, Tiwari HK, Tracy RP, Wang T, Weiss ST, Weng LC, Wiggins KL, Wilson JG, Yanek LR, Zöllner S, North KE, Auer PL, Raffield LM, Reiner AP, Li Y. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet 2019; 15:e1008500. [PMID: 31869403 PMCID: PMC6953885 DOI: 10.1371/journal.pgen.1008500] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 01/10/2020] [Accepted: 10/30/2019] [Indexed: 01/10/2023] Open
Abstract
Most genome-wide association and fine-mapping studies to date have been conducted in individuals of European descent, and genetic studies of populations of Hispanic/Latino and African ancestry are limited. In addition, these populations have more complex linkage disequilibrium structure. In order to better define the genetic architecture of these understudied populations, we leveraged >100,000 phased sequences available from deep-coverage whole genome sequencing through the multi-ethnic NHLBI Trans-Omics for Precision Medicine (TOPMed) program to impute genotypes into admixed African and Hispanic/Latino samples with genome-wide genotyping array data. We demonstrated that using TOPMed sequencing data as the imputation reference panel improves genotype imputation quality in these populations, which subsequently enhanced gene-mapping power for complex traits. For rare variants with minor allele frequency (MAF) < 0.5%, we observed a 2.3- to 6.1-fold increase in the number of well-imputed variants, with 11-34% improvement in average imputation quality, compared to the state-of-the-art 1000 Genomes Project Phase 3 and Haplotype Reference Consortium reference panels. Impressively, even for extremely rare variants with minor allele count <10 (including singletons) in the imputation target samples, average information content rescued was >86%. Subsequent association analyses of TOPMed reference panel-imputed genotype data with hematological traits (hemoglobin (HGB), hematocrit (HCT), and white blood cell count (WBC)) in ~21,600 African-ancestry and ~21,700 Hispanic/Latino individuals identified associations with two rare variants in the HBB gene (rs33930165 with higher WBC [p = 8.8x10-15] in African populations, rs11549407 with lower HGB [p = 1.5x10-12] and HCT [p = 8.8x10-10] in Hispanics/Latinos). By comparison, neither variant would have been genome-wide significant if either 1000 Genomes Project Phase 3 or Haplotype Reference Consortium reference panels had been used for imputation. Our findings highlight the utility of the TOPMed imputation reference panel for identification of novel rare variant associations not previously detected in similarly sized genome-wide studies of under-represented African and Hispanic/Latino populations.
Collapse
|
34
|
Stahl EA, Breen G, Forstner AJ, McQuillin A, Ripke S, Trubetskoy V, Mattheisen M, Wang Y, Coleman JRI, Gaspar HA, de Leeuw CA, Steinberg S, Pavlides JMW, Trzaskowski M, Byrne EM, Pers TH, Holmans PA, Richards AL, Abbott L, Agerbo E, Akil H, Albani D, Alliey-Rodriguez N, Als TD, Anjorin A, Antilla V, Awasthi S, Badner JA, Bækvad-Hansen M, Barchas JD, Bass N, Bauer M, Belliveau R, Bergen SE, Pedersen CB, Bøen E, Boks MP, Boocock J, Budde M, Bunney W, Burmeister M, Bybjerg-Grauholm J, Byerley W, Casas M, Cerrato F, Cervantes P, Chambert K, Charney AW, Chen D, Churchhouse C, Clarke TK, Coryell W, Craig DW, Cruceanu C, Curtis D, Czerski PM, Dale AM, de Jong S, Degenhardt F, Del-Favero J, DePaulo JR, Djurovic S, Dobbyn AL, Dumont A, Elvsåshagen T, Escott-Price V, Fan CC, Fischer SB, Flickinger M, Foroud TM, Forty L, Frank J, Fraser C, Freimer NB, Frisén L, Gade K, Gage D, Garnham J, Giambartolomei C, Pedersen MG, Goldstein J, Gordon SD, Gordon-Smith K, Green EK, Green MJ, Greenwood TA, Grove J, Guan W, Guzman-Parra J, Hamshere ML, Hautzinger M, Heilbronner U, Herms S, Hipolito M, Hoffmann P, Holland D, Huckins L, Jamain S, Johnson JS, Juréus A, Kandaswamy R, Karlsson R, Kennedy JL, Kittel-Schneider S, Knowles JA, Kogevinas M, Koller AC, Kupka R, Lavebratt C, Lawrence J, Lawson WB, Leber M, Lee PH, Levy SE, Li JZ, Liu C, Lucae S, Maaser A, MacIntyre DJ, Mahon PB, Maier W, Martinsson L, McCarroll S, McGuffin P, McInnis MG, McKay JD, Medeiros H, Medland SE, Meng F, Milani L, Montgomery GW, Morris DW, Mühleisen TW, Mullins N, Nguyen H, Nievergelt CM, Adolfsson AN, Nwulia EA, O'Donovan C, Loohuis LMO, Ori APS, Oruc L, Ösby U, Perlis RH, Perry A, Pfennig A, Potash JB, Purcell SM, Regeer EJ, Reif A, Reinbold CS, Rice JP, Rivas F, Rivera M, Roussos P, Ruderfer DM, Ryu E, Sánchez-Mora C, Schatzberg AF, Scheftner WA, Schork NJ, Shannon Weickert C, Shehktman T, Shilling PD, Sigurdsson E, Slaney C, Smeland OB, Sobell JL, Søholm Hansen C, Spijker AT, St Clair D, Steffens M, Strauss JS, Streit F, Strohmaier J, Szelinger S, Thompson RC, Thorgeirsson TE, Treutlein J, Vedder H, Wang W, Watson SJ, Weickert TW, Witt SH, Xi S, Xu W, Young AH, Zandi P, Zhang P, Zöllner S, Adolfsson R, Agartz I, Alda M, Backlund L, Baune BT, Bellivier F, Berrettini WH, Biernacka JM, Blackwood DHR, Boehnke M, Børglum AD, Corvin A, Craddock N, Daly MJ, Dannlowski U, Esko T, Etain B, Frye M, Fullerton JM, Gershon ES, Gill M, Goes F, Grigoroiu-Serbanescu M, Hauser J, Hougaard DM, Hultman CM, Jones I, Jones LA, Kahn RS, Kirov G, Landén M, Leboyer M, Lewis CM, Li QS, Lissowska J, Martin NG, Mayoral F, McElroy SL, McIntosh AM, McMahon FJ, Melle I, Metspalu A, Mitchell PB, Morken G, Mors O, Mortensen PB, Müller-Myhsok B, Myers RM, Neale BM, Nimgaonkar V, Nordentoft M, Nöthen MM, O'Donovan MC, Oedegaard KJ, Owen MJ, Paciga SA, Pato C, Pato MT, Posthuma D, Ramos-Quiroga JA, Ribasés M, Rietschel M, Rouleau GA, Schalling M, Schofield PR, Schulze TG, Serretti A, Smoller JW, Stefansson H, Stefansson K, Stordal E, Sullivan PF, Turecki G, Vaaler AE, Vieta E, Vincent JB, Werge T, Nurnberger JI, Wray NR, Di Florio A, Edenberg HJ, Cichon S, Ophoff RA, Scott LJ, Andreassen OA, Kelsoe J, Sklar P. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat Genet 2019; 51:793-803. [PMID: 31043756 PMCID: PMC6956732 DOI: 10.1038/s41588-019-0397-8] [Citation(s) in RCA: 901] [Impact Index Per Article: 180.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 03/18/2019] [Indexed: 12/18/2022]
Abstract
Bipolar disorder is a highly heritable psychiatric disorder. We performed a genome-wide association study (GWAS) including 20,352 cases and 31,358 controls of European descent, with follow-up analysis of 822 variants with P < 1 × 10-4 in an additional 9,412 cases and 137,760 controls. Eight of the 19 variants that were genome-wide significant (P < 5 × 10-8) in the discovery GWAS were not genome-wide significant in the combined analysis, consistent with small effect sizes and limited power but also with genetic heterogeneity. In the combined analysis, 30 loci were genome-wide significant, including 20 newly identified loci. The significant loci contain genes encoding ion channels, neurotransmitter transporters and synaptic components. Pathway analysis revealed nine significantly enriched gene sets, including regulation of insulin secretion and endocannabinoid signaling. Bipolar I disorder is strongly genetically correlated with schizophrenia, driven by psychosis, whereas bipolar II disorder is more strongly correlated with major depressive disorder. These findings address key clinical questions and provide potential biological mechanisms for bipolar disorder.
Collapse
|
35
|
Budde M, Friedrichs S, Alliey-Rodriguez N, Ament S, Badner JA, Berrettini WH, Bloss CS, Byerley W, Cichon S, Comes AL, Coryell W, Craig DW, Degenhardt F, Edenberg HJ, Foroud T, Forstner AJ, Frank J, Gershon ES, Goes FS, Greenwood TA, Guo Y, Hipolito M, Hood L, Keating BJ, Koller DL, Lawson WB, Liu C, Mahon PB, McInnis MG, McMahon FJ, Meier SM, Mühleisen TW, Murray SS, Nievergelt CM, Nurnberger JI, Nwulia EA, Potash JB, Quarless D, Rice J, Roach JC, Scheftner WA, Schork NJ, Shekhtman T, Shilling PD, Smith EN, Streit F, Strohmaier J, Szelinger S, Treutlein J, Witt SH, Zandi PP, Zhang P, Zöllner S, Bickeböller H, Falkai PG, Kelsoe JR, Nöthen MM, Rietschel M, Schulze TG, Malzahn D. Efficient region-based test strategy uncovers genetic risk factors for functional outcome in bipolar disorder. Eur Neuropsychopharmacol 2019; 29:156-170. [PMID: 30503783 DOI: 10.1016/j.euroneuro.2018.10.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 10/16/2018] [Accepted: 10/23/2018] [Indexed: 11/21/2022]
Abstract
Genome-wide association studies of case-control status have advanced the understanding of the genetic basis of psychiatric disorders. Further progress may be gained by increasing sample size but also by new analysis strategies that advance the exploitation of existing data, especially for clinically important quantitative phenotypes. The functionally-informed efficient region-based test strategy (FIERS) introduced herein uses prior knowledge on biological function and dependence of genotypes within a powerful statistical framework with improved sensitivity and specificity for detecting consistent genetic effects across studies. As proof of concept, FIERS was used for the first genome-wide single nucleotide polymorphism (SNP)-based investigation on bipolar disorder (BD) that focuses on an important aspect of disease course, the functional outcome. FIERS identified a significantly associated locus on chromosome 15 (hg38: chr15:48965004 - 49464789 bp) with consistent effect strength between two independent studies (GAIN/TGen: European Americans, BOMA: Germans; n = 1592 BD patients in total). Protective and risk haplotypes were found on the most strongly associated SNPs. They contain a CTCF binding site (rs586758); CTCF sites are known to regulate sets of genes within a chromatin domain. The rs586758 - rs2086256 - rs1904317 haplotype is located in the promoter flanking region of the COPS2 gene, close to microRNA4716, and the EID1, SHC4, DTWD1 genes as plausible biological candidates. While implication with BD is novel, COPS2, EID1, and SHC4 are known to be relevant for neuronal differentiation and function and DTWD1 for psychopharmacological side effects. The test strategy FIERS that enabled this discovery is equally applicable for tag SNPs and sequence data.
Collapse
|
36
|
Carlson J, Li JZ, Zöllner S. Helmsman: fast and efficient mutation signature analysis for massive sequencing datasets. BMC Genomics 2018; 19:845. [PMID: 30486787 PMCID: PMC6263557 DOI: 10.1186/s12864-018-5264-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 11/19/2018] [Indexed: 12/14/2022] Open
Abstract
Background The spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants. Results We introduce Helmsman, a program designed to perform mutation signature analysis on arbitrarily large sequencing datasets. Helmsman is up to 300 times faster than existing software. Helmsman’s memory usage is independent of the number of variants, resulting in a small enough memory footprint to analyze datasets that would otherwise exceed the memory limitations of other programs. Conclusions Helmsman is a computationally efficient tool that enables users to evaluate mutational signatures in massive sequencing datasets that are otherwise intractable with existing software. Helmsman is freely available at https://github.com/carjed/helmsman. Electronic supplementary material The online version of this article (10.1186/s12864-018-5264-y) contains supplementary material, which is available to authorized users.
Collapse
|
37
|
Breuer R, Mattheisen M, Frank J, Krumm B, Treutlein J, Kassem L, Strohmaier J, Herms S, Mühleisen TW, Degenhardt F, Cichon S, Nöthen MM, Karypis G, Kelsoe J, Greenwood T, Nievergelt C, Shilling P, Shekhtman T, Edenberg H, Craig D, Szelinger S, Nurnberger J, Gershon E, Alliey-Rodriguez N, Zandi P, Goes F, Schork N, Smith E, Koller D, Zhang P, Badner J, Berrettini W, Bloss C, Byerley W, Coryell W, Foroud T, Guo Y, Hipolito M, Keating B, Lawson W, Liu C, Mahon P, McInnis M, Murray S, Nwulia E, Potash J, Rice J, Scheftner W, Zöllner S, McMahon FJ, Rietschel M, Schulze TG. Detecting significant genotype-phenotype association rules in bipolar disorder: market research meets complex genetics. Int J Bipolar Disord 2018; 6:24. [PMID: 30415424 PMCID: PMC6230336 DOI: 10.1186/s40345-018-0132-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 08/22/2018] [Indexed: 12/21/2022] Open
Abstract
Background Disentangling the etiology of common, complex diseases is a major challenge in genetic research. For bipolar disorder (BD), several genome-wide association studies (GWAS) have been performed. Similar to other complex disorders, major breakthroughs in explaining the high heritability of BD through GWAS have remained elusive. To overcome this dilemma, genetic research into BD, has embraced a variety of strategies such as the formation of large consortia to increase sample size and sequencing approaches. Here we advocate a complementary approach making use of already existing GWAS data: a novel data mining procedure to identify yet undetected genotype–phenotype relationships. We adapted association rule mining, a data mining technique traditionally used in retail market research, to identify frequent and characteristic genotype patterns showing strong associations to phenotype clusters. We applied this strategy to three independent GWAS datasets from 2835 phenotypically characterized patients with BD. In a discovery step, 20,882 candidate association rules were extracted. Results Two of these rules—one associated with eating disorder and the other with anxiety—remained significant in an independent dataset after robust correction for multiple testing. Both showed considerable effect sizes (odds ratio ~ 3.4 and 3.0, respectively) and support previously reported molecular biological findings. Conclusion Our approach detected novel specific genotype–phenotype relationships in BD that were missed by standard analyses like GWAS. While we developed and applied our method within the context of BD gene discovery, it may facilitate identifying highly specific genotype–phenotype relationships in subsets of genome-wide data sets of other complex phenotype with similar epidemiological properties and challenges to gene discovery efforts. Electronic supplementary material The online version of this article (10.1186/s40345-018-0132-x) contains supplementary material, which is available to authorized users.
Collapse
|
38
|
Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, Myers RM, Boehnke M, Kang HM, Scott LJ, Li JZ, Zöllner S. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun 2018; 9:3753. [PMID: 30218074 PMCID: PMC6138700 DOI: 10.1038/s41467-018-05936-5] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 07/30/2018] [Indexed: 12/30/2022] Open
Abstract
A detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms. Remarkably, GC content, DNase hypersensitivity, CpG islands, and H3K36 trimethylation are associated with both increased and decreased mutation rates depending on nucleotide context. We validate these estimated effects in an independent dataset of ~46,000 de novo mutations, and confirm our estimates are more accurate than previously published results based on ancestrally older variants without considering genomic features. Our results thus provide the most refined portrait to date of the factors contributing to genome-wide variability of the human germline mutation rate.
Collapse
|
39
|
Reppell M, Zöllner S. An efficient algorithm for generating the internal branches of a Kingman coalescent. Theor Popul Biol 2018; 122:57-66. [PMID: 28709926 PMCID: PMC5764821 DOI: 10.1016/j.tpb.2017.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 05/19/2017] [Accepted: 05/26/2017] [Indexed: 01/16/2023]
Abstract
Coalescent simulations are a widely used approach for simulating sample genealogies, but can become computationally burdensome in large samples. Methods exist to analytically calculate a sample's expected frequency spectrum without simulating full genealogies. However, statistics that rely on the distribution of the length of internal coalescent branches, such as the probability that two mutations of equal size arose on the same genealogical branch, have previously required full coalescent simulations to estimate. Here, we present a sampling method capable of efficiently generating limited portions of sample genealogies using a series of analytic equations that give probabilities for the number, start, and end of internal branches conditional on the number of final samples they subtend. These equations are independent of the coalescent waiting times and need only be calculated a single time, lending themselves to efficient computation. We compare our method with full coalescent simulations to show the resulting distribution of branch lengths and summary statistics are equivalent, but that for many conditions our method is at least 10 times faster.
Collapse
|
40
|
Prossin AR, Chandler M, Ryan KA, Saunders EF, Kamali M, Papadopoulos V, Zöllner S, Dantzer R, McInnis MG. Functional TSPO polymorphism predicts variance in the diurnal cortisol rhythm in bipolar disorder. Psychoneuroendocrinology 2018; 89:194-202. [PMID: 29414032 PMCID: PMC6048960 DOI: 10.1016/j.psyneuen.2018.01.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 01/11/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]
Abstract
INTRODUCTION Psychosocial stress contributes to onset/exacerbation of mood episodes and alcohol use, suggesting dysregulated diurnal cortisol rhythms underlie episodic exacerbations in Bipolar Disorder (BD). However, mechanisms underlying dysregulated HPA rhythms in BD and alcohol use disorders (AUD) are understudied. Knowledge of associated variance factors have great clinical translational potential by facilitating development of strategies to reduce stress-related relapse in BD and AUD. Evidence suggests structural changes to mitochondrial translocator protein (TSPO) (a regulator of steroid synthesis) due to the single nucleotide polymorphism rs6971, may explain much of this variance. However, whether rs6971 is associated with abnormal HPA rhythms and clinical exacerbation in humans is unknown. METHODS To show this common TSPO polymorphism impacts HPA rhythms in BD, we tested whether rs6971 (dichotomized: presence/absence of polymorphism) predicted variance in diurnal cortisol rhythm (saliva: morning and evening for 3 days) in 107 BD (50 with and 57 without AUD) and 28 healthy volunteers of similar age and ethno-demographic distribution. RESULTS Repeated measures ANOVA confirmed effects BD (F5,525 = 3.0, p = 0.010) and AUD (F5,525 = 2.9, p = 0.012), but not TSPO polymorphism (p > 0.05). Interactions were confirmed for TSPO × BD (F5,525 = 3.9, p = 0.002) and for TSPO × AUD (F5,525 = 2.8, p = 0.017). DISCUSSION We identified differences in diurnal cortisol rhythm depending on presence/absence of common TSPO polymorphism in BD volunteers with or without AUD and healthy volunteers. These results have wide ranging implications but further validation is needed prior to optimal clinical translation.
Collapse
|
41
|
Boyce M, Warrington S, Cortezi B, Zöllner S, Vauléon S, Swinkels DW, Summo L, Schwoebel F, Riecke K. Safety, pharmacokinetics and pharmacodynamics of the anti-hepcidin Spiegelmer lexaptepid pegol in healthy subjects. Br J Pharmacol 2016; 173:1580-8. [PMID: 26773325 PMCID: PMC4842915 DOI: 10.1111/bph.13433] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2015] [Revised: 01/11/2016] [Accepted: 01/11/2016] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND AND PURPOSE Anaemia of chronic disease is characterized by impaired erythropoiesis due to functional iron deficiency, often caused by excessive hepcidin. Lexaptepid pegol, a pegylated structured l-oligoribonucleotide, binds and inactivates hepcidin. EXPERIMENTAL APPROACH We conducted a placebo-controlled study on the safety, pharmacokinetics and pharmacodynamics of lexaptepid after single and repeated i.v. and s.c. administration to 64 healthy subjects at doses from 0.3 to 4.8 mg·kg(-1) . KEY RESULTS After treatment with lexaptepid, serum iron concentration and transferrin increased dose-dependently. Iron increased from approximately 20 μmol·L(-1) at baseline by 67% at 8 h after i.v. infusion of 1.2 mg·kg(-1) lexaptepid. The pharmacokinetics showed dose-proportional increases in peak plasma concentrations and moderately over-proportional increases in systemic exposure. Lexaptepid had no effect on hepcidin production or anti-drug antibodies. Treatment with lexaptepid was generally safe and well tolerated, with mild and transient transaminase increases at doses ≥2.4 mg·kg(-1) and with local injection site reactions after s.c. but not after i.v. administration. CONCLUSIONS AND IMPLICATIONS Lexaptepid pegol inhibited hepcidin and dose-dependently raised serum iron and transferrin saturation. The compound is being further developed to treat anaemia of chronic disease.
Collapse
|
42
|
Li M, Rothwell R, Vermaat M, Wachsmuth M, Schröder R, Laros JFJ, van Oven M, de Bakker PIW, Bovenberg JA, van Duijn CM, van Ommen GJB, Slagboom PE, Swertz MA, Wijmenga C, Kayser M, Boomsma DI, Zöllner S, de Knijff P, Stoneking M. Transmission of human mtDNA heteroplasmy in the Genome of the Netherlands families: support for a variable-size bottleneck. Genome Res 2016; 26:417-26. [PMID: 26916109 PMCID: PMC4817766 DOI: 10.1101/gr.203216.115] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 01/21/2016] [Indexed: 12/17/2022]
Abstract
Although previous studies have documented a bottleneck in the transmission of mtDNA genomes from mothers to offspring, several aspects remain unclear, including the size and nature of the bottleneck. Here, we analyze the dynamics of mtDNA heteroplasmy transmission in the Genomes of the Netherlands (GoNL) data, which consists of complete mtDNA genome sequences from 228 trios, eight dizygotic (DZ) twin quartets, and 10 monozygotic (MZ) twin quartets. Using a minor allele frequency (MAF) threshold of 2%, we identified 189 heteroplasmies in the trio mothers, of which 59% were transmitted to offspring, and 159 heteroplasmies in the trio offspring, of which 70% were inherited from the mothers. MZ twin pairs exhibited greater similarity in MAF at heteroplasmic sites than DZ twin pairs, suggesting that the heteroplasmy MAF in the oocyte is the major determinant of the heteroplasmy MAF in the offspring. We used a likelihood method to estimate the effective number of mtDNA genomes transmitted to offspring under different bottleneck models; a variable bottleneck size model provided the best fit to the data, with an estimated mean of nine individual mtDNA genomes transmitted. We also found evidence for negative selection during transmission against novel heteroplasmies (in which the minor allele has never been observed in polymorphism data). These novel heteroplasmies are enhanced for tRNA and rRNA genes, and mutations associated with mtDNA diseases frequently occur in these genes. Our results thus suggest that the female germ line is able to recognize and select against deleterious heteroplasmies.
Collapse
|
43
|
Tang CS, Zhang H, Cheung CYY, Xu M, Ho JCY, Zhou W, Cherny SS, Zhang Y, Holmen O, Au KW, Yu H, Xu L, Jia J, Porsch RM, Sun L, Xu W, Zheng H, Wong LY, Mu Y, Dou J, Fong CHY, Wang S, Hong X, Dong L, Liao Y, Wang J, Lam LSM, Su X, Yan H, Yang ML, Chen J, Siu CW, Xie G, Woo YC, Wu Y, Tan KCB, Hveem K, Cheung BMY, Zöllner S, Xu A, Eugene Chen Y, Jiang CQ, Zhang Y, Lam TH, Ganesh SK, Huo Y, Sham PC, Lam KSL, Willer CJ, Tse HF, Gao W. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese. Nat Commun 2015; 6:10206. [PMID: 26690388 PMCID: PMC4703860 DOI: 10.1038/ncomms10206] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 11/13/2015] [Indexed: 12/19/2022] Open
Abstract
Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10−7), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci—PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser—also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. An important risk factor for coronary artery disease is the level of blood lipids. Here the authors conduct an exome-wide association study in Chinese cohorts and identify three novel loci associated with lipid levels as well as three Asian-specific variants in known loci.
Collapse
|
44
|
Lo Y, Zhang L, Foxman B, Zöllner S. Whole-genome sequencing of uropathogenic Escherichia coli reveals long evolutionary history of diversity and virulence. INFECTION GENETICS AND EVOLUTION 2015; 34:244-50. [PMID: 26112070 DOI: 10.1016/j.meegid.2015.06.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 06/17/2015] [Accepted: 06/20/2015] [Indexed: 01/07/2023]
Abstract
Uropathogenic Escherichia coli (UPEC) are phenotypically and genotypically very diverse. This diversity makes it challenging to understand the evolution of UPEC adaptations responsible for causing urinary tract infections (UTI). To gain insight into the relationship between evolutionary divergence and adaptive paths to uropathogenicity, we sequenced at deep coverage (190×) the genomes of 19 E. coli strains from urinary tract infection patients from the same geographic area. Our sample consisted of 14 UPEC isolates and 5 non-UTI-causing (commensal) rectal E. coli isolates. After identifying strain variants using de novo assembly-based methods, we clustered the strains based on pairwise sequence differences using a neighbor-joining algorithm. We examined evolutionary signals on the whole-genome phylogeny and contrasted these signals with those found on gene trees constructed based on specific uropathogenic virulence factors. The whole-genome phylogeny showed that the divergence between UPEC and commensal E. coli strains without known UPEC virulence factors happened over 32 million generations ago. Pairwise diversity between any two strains was also high, suggesting multiple genetic origins of uropathogenic strains in a small geographic region. Contrasting the whole-genome phylogeny with three gene trees constructed from common uropathogenic virulence factors, we detected no selective advantage of these virulence genes over other genomic regions. These results suggest that UPEC acquired uropathogenicity long time ago and used it opportunistically to cause extraintestinal infections.
Collapse
|
45
|
Lin KH, Zöllner S. Robust and Powerful Affected Sibpair Test for Rare Variant Association. Genet Epidemiol 2015; 39:325-33. [PMID: 25966809 DOI: 10.1002/gepi.21903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 03/25/2015] [Accepted: 04/01/2015] [Indexed: 11/09/2022]
Abstract
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case-control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family-based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family-based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case-control study for variants with summed risk allele frequency f < 0.05; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene-gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.
Collapse
|
46
|
Lo Y, Kang HM, Nelson MR, Othman MI, Chissoe SL, Ehm MG, Abecasis GR, Zöllner S. Comparing variant calling algorithms for target-exon sequencing in a large sample. BMC Bioinformatics 2015; 16:75. [PMID: 25884587 PMCID: PMC4359451 DOI: 10.1186/s12859-015-0489-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 02/03/2015] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Sequencing studies of exonic regions aim to identify rare variants contributing to complex traits. With high coverage and large sample size, these studies tend to apply simple variant calling algorithms. However, coverage is often heterogeneous; sites with insufficient coverage may benefit from sophisticated calling algorithms used in low-coverage sequencing studies. We evaluate the potential benefits of different calling strategies by performing a comparative analysis of variant calling methods on exonic data from 202 genes sequenced at 24x in 7,842 individuals. We call variants using individual-based, population-based and linkage disequilibrium (LD)-aware methods with stringent quality control. We measure genotype accuracy by the concordance with on-target GWAS genotypes and between 80 pairs of sequencing replicates. We validate selected singleton variants using capillary sequencing. RESULTS Using these calling methods, we detected over 27,500 variants at the targeted exons; >57% were singletons. The singletons identified by individual-based analyses were of the highest quality. However, individual-based analyses generated more missing genotypes (4.72%) than population-based (0.47%) and LD-aware (0.17%) analyses. Moreover, individual-based genotypes were the least concordant with array-based genotypes and replicates. Population-based genotypes were less concordant than genotypes from LD-aware analyses with extended haplotypes. We reanalyzed the same dataset with a second set of callers and showed again that the individual-based caller identified more high-quality singletons than the population-based caller. We also replicated this result in a second dataset of 57 genes sequenced at 127.5x in 3,124 individuals. CONCLUSIONS We recommend population-based analyses for high quality variant calls with few missing genotypes. With extended haplotypes, LD-aware methods generate the most accurate and complete genotypes. In addition, individual-based analyses should complement the above methods to obtain the most singleton variants.
Collapse
|
47
|
Maier R, Moser G, Chen GB, Ripke S, Coryell W, Potash JB, Scheftner WA, Shi J, Weissman MM, Hultman CM, Landén M, Levinson DF, Kendler KS, Smoller JW, Wray NR, Lee SH, Absher D, Agartz I, Akil H, Amin F, Andreassen O, Anjorin A, Anney R, Arking D, Asherson P, Azevedo M, Backlund L, Badner J, Bailey A, Banaschewski T, Barchas J, Barnes M, Barrett T, Bass N, Battaglia A, Bauer M, Bayés M, Bellivier F, Bergen S, Berrettini W, Betancur C, Bettecken T, Biederman J, Binder E, Black D, Blackwood D, Bloss C, Boehnke M, Boomsma D, Breen G, Breuer R, Bruggeman R, Buccola N, Buitelaar J, Bunney W, Buxbaum J, Byerley W, Caesar S, Cahn W, Cantor R, Casas M, Chakravarti A, Chambert K, Choudhury K, Cichon S, Cloninger C, Collier D, Cook E, Coon H, Cormand B, Cormican P, Corvin A, Coryell W, Craddock N, Craig D, Craig I, Crosbie J, Cuccaro M, Curtis D, Czamara D, Daly M, Datta S, Dawson G, Day R, De Geus E, Degenhardt F, Devlin B, Djurovic S, Donohoe G, Doyle A, Duan J, Dudbridge F, Duketis E, Ebstein R, Edenberg H, Elia J, Ennis S, Etain B, Fanous A, Faraone S, Farmer A, Ferrier I, Flickinger M, Fombonne E, Foroud T, Frank J, Franke B, Fraser C, Freedman R, Freimer N, Freitag C, Friedl M, Frisén L, Gallagher L, Gejman P, Georgieva L, Gershon E, Geschwind D, Giegling I, Gill M, Gordon S, Gordon-Smith K, Green E, Greenwood T, Grice D, Gross M, Grozeva D, Guan W, Gurling H, De Haan L, Haines J, Hakonarson H, Hallmayer J, Hamilton S, Hamshere M, Hansen T, Hartmann A, Hautzinger M, Heath A, Henders A, Herms S, Hickie I, Hipolito M, Hoefels S, Holmans P, Holsboer F, Hoogendijk W, Hottenga JJ, Hultman C, Hus V, Ingason A, Ising M, Jamain S, Jones I, Jones L, Kähler A, Kahn R, Kandaswamy R, Keller M, Kelsoe J, Kendler K, Kennedy J, Kenny E, Kent L, Kim Y, Kirov G, Klauck S, Klei L, Knowles J, Kohli M, Koller D, Konte B, Korszun A, Krabbendam L, Krasucki R, Kuntsi J, Kwan P, Landén M, Långström N, Lathrop M, Lawrence J, Lawson W, Leboyer M, Ledbetter D, Lee P, Lencz T, Lesch KP, Levinson D, Lewis C, Li J, Lichtenstein P, Lieberman J, Lin DY, Linszen D, Liu C, Lohoff F, Loo S, Lord C, Lowe J, Lucae S, MacIntyre D, Madden P, Maestrini E, Magnusson P, Mahon P, Maier W, Malhotra A, Mane S, Martin C, Martin N, Mattheisen M, Matthews K, Mattingsdal M, McCarroll S, McGhee K, McGough J, McGrath P, McGuffin P, McInnis M, McIntosh A, McKinney R, McLean A, McMahon F, McMahon W, McQuillin A, Medeiros H, Medland S, Meier S, Melle I, Meng F, Meyer J, Middeldorp C, Middleton L, Milanova V, Miranda A, Monaco A, Montgomery G, Moran J, Moreno-De-Luca D, Morken G, Morris D, Morrow E, Moskvina V, Mowry B, Muglia P, Mühleisen T, Müller-Myhsok B, Murtha M, Myers R, Myin-Germeys I, Neale B, Nelson S, Nievergelt C, Nikolov I, Nimgaonkar V, Nolen W, Nöthen M, Nurnberger J, Nwulia E, Nyholt D, O’Donovan M, O’Dushlaine C, Oades R, Olincy A, Oliveira G, Olsen L, Ophoff R, Osby U, Owen M, Palotie A, Parr J, Paterson A, Pato C, Pato M, Penninx B, Pergadia M, Pericak-Vance M, Perlis R, Pickard B, Pimm J, Piven J, Posthuma D, Potash J, Poustka F, Propping P, Purcell S, Puri V, Quested D, Quinn E, Ramos-Quiroga J, Rasmussen H, Raychaudhuri S, Rehnström K, Reif A, Ribasés M, Rice J, Rietschel M, Ripke S, Roeder K, Roeyers H, Rossin L, Rothenberger A, Rouleau G, Ruderfer D, Rujescu D, Sanders A, Sanders S, Santangelo S, Schachar R, Schalling M, Schatzberg A, Scheftner W, Schellenberg G, Scherer S, Schork N, Schulze T, Schumacher J, Schwarz M, Scolnick E, Scott L, Sergeant J, Shi J, Shilling P, Shyn S, Silverman J, Sklar P, Slager S, Smalley S, Smit J, Smith E, Smoller J, Sonuga-Barke E, St Clair D, State M, Steffens M, Steinhausen HC, Strauss J, Strohmaier J, Stroup T, Sullivan P, Sutcliffe J, Szatmari P, Szelinger S, Thapar A, Thirumalai S, Thompson R, Todorov A, Tozzi F, Treutlein J, Tzeng JY, Uhr M, van den Oord E, Van Grootheest G, Van Os J, Vicente A, Vieland V, Vincent J, Visscher P, Walsh C, Wassink T, Watson S, Weiss L, Weissman M, Werge T, Wienker T, Wiersma D, Wijsman E, Willemsen G, Williams N, Willsey A, Witt S, Wray N, Xu W, Young A, Yu T, Zammit S, Zandi P, Zhang P, Zitman F, Zöllner S. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet 2015; 96:283-94. [PMID: 25640677 PMCID: PMC4320268 DOI: 10.1016/j.ajhg.2014.12.006] [Citation(s) in RCA: 163] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Accepted: 12/08/2014] [Indexed: 12/11/2022] Open
Abstract
Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk.
Collapse
|
48
|
Wang C, Zhan X, Bragg-Gresham J, Kang HM, Stambolian D, Chew EY, Branham KE, Heckenlively J, Fulton R, Wilson RK, Mardis ER, Lin X, Swaroop A, Zöllner S, Abecasis GR. Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet 2014; 46:409-15. [PMID: 24633160 PMCID: PMC4084909 DOI: 10.1038/ng.2924] [Citation(s) in RCA: 105] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 02/21/2014] [Indexed: 12/15/2022]
Abstract
Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.
Collapse
|
49
|
Moroi SE, Raoof DA, Reed DM, Zöllner S, Qin Z, Richards JE. Progress toward personalized medicine for glaucoma. EXPERT REVIEW OF OPHTHALMOLOGY 2014; 4:145-161. [PMID: 23914252 DOI: 10.1586/eop.09.6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
How will you respond when a patient asks, "Doctor, what can I do to prevent myself from going blind from glaucoma like mom?". There is optimism that genetic profiling will help target patients to individualized treatments based on validated disease risk alleles, validated pharmacogenetic markers and behavioral modification. Personalized medicine will become a reality through identification of disease and pharmacogenetic markers, followed by careful study of how to employ this information in order to improve treatment outcomes. With advances in genomic technologies, research has shifted from the simple monogenic disease model to a complex multigenic and environmental disease model to answer these questions. Our challenges lie in developing risk models that incorporate gene-gene interactions, gene copy-number variations, environmental interactions, treatment effects and clinical covariates.
Collapse
|
50
|
Zawistowski M, Reppell M, Wegmann D, St Jean PL, Ehm MG, Nelson MR, Novembre J, Zöllner S. Analysis of rare variant population structure in Europeans explains differential stratification of gene-based tests. Eur J Hum Genet 2014; 22:1137-44. [PMID: 24398795 DOI: 10.1038/ejhg.2013.297] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Revised: 11/27/2013] [Accepted: 11/28/2013] [Indexed: 11/09/2022] Open
Abstract
There is substantial interest in the role of rare genetic variants in the etiology of complex human diseases. Several gene-based tests have been developed to simultaneously analyze multiple rare variants for association with phenotypic traits. The tests can largely be partitioned into two classes - 'burden' tests and 'joint' tests - based on how they accumulate evidence of association across sites. We used the empirical joint site frequency spectra of rare, nonsynonymous variation from a large multi-population sequencing study to explore the effect of realistic rare variant population structure on gene-based tests. We observed an important difference between the two test classes: their susceptibility to population stratification. Focusing on European samples, we found that joint tests, which allow variants to have opposite directions of effect, consistently showed higher levels of P-value inflation than burden tests. We determined that the differential stratification was caused by two specific patterns in the interpopulation distribution of rare variants, each correlating with inflation in one of the test classes. The pattern that inflates joint tests is more prevalent in real data, explaining the higher levels of inflation in these tests. Furthermore, we show that the different sources of inflation between tests lead to heterogeneous responses to genomic control correction and the number of variants analyzed. Our results indicate that care must be taken when interpreting joint and burden analyses of the same set of rare variants, in particular, to avoid mistaking inflated P-values in joint tests for stronger signals of true associations.
Collapse
|