1
|
Liang X, Duan Q, Li B, Wang Y, Bu Y, Zhang Y, Kuang Z, Mao L, An X, Wang H, Yang X, Wan N, Feng Z, Shen W, Miao W, Chen J, Liu S, Storz JF, Liu J, Nevo E, Li K. Genomic structural variation contributes to evolved changes in gene expression in high-altitude Tibetan sheep. Proc Natl Acad Sci U S A 2024; 121:e2322291121. [PMID: 38913905 PMCID: PMC11228492 DOI: 10.1073/pnas.2322291121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/06/2024] [Indexed: 06/26/2024] Open
Abstract
Tibetan sheep were introduced to the Qinghai Tibet plateau roughly 3,000 B.P., making this species a good model for investigating genetic mechanisms of high-altitude adaptation over a relatively short timescale. Here, we characterize genomic structural variants (SVs) that distinguish Tibetan sheep from closely related, low-altitude Hu sheep, and we examine associated changes in tissue-specific gene expression. We document differentiation between the two sheep breeds in frequencies of SVs associated with genes involved in cardiac function and circulation. In Tibetan sheep, we identified high-frequency SVs in a total of 462 genes, including EPAS1, PAPSS2, and PTPRD. Single-cell RNA-Seq data and luciferase reporter assays revealed that the SVs had cis-acting effects on the expression levels of these three genes in specific tissues and cell types. In Tibetan sheep, we identified a high-frequency chromosomal inversion that exhibited modified chromatin architectures relative to the noninverted allele that predominates in Hu sheep. The inversion harbors several genes with altered expression patterns related to heart protection, brown adipocyte proliferation, angiogenesis, and DNA repair. These findings indicate that SVs represent an important source of genetic variation in gene expression and may have contributed to high-altitude adaptation in Tibetan sheep.
Collapse
Affiliation(s)
- Xiaolong Liang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Qijiao Duan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Bowen Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Yinjia Wang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Yueting Bu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Yonglu Zhang
- Fengjia Town Health Center, Rushan City, Weihai City264200, China
| | - Zhuoran Kuang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Leyan Mao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Xuan An
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Huihua Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing100193, China
| | - Xiaojie Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Na Wan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Zhilong Feng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Wei Shen
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Weilan Miao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Jiaqi Chen
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Sanyuan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Jay F. Storz
- School of Biological Sciences, University of Nebraska, Lincoln, NE68588
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Eviatar Nevo
- Institute of Evolution, University of Haifa, Haifa3498838, Israel
| | - Kexin Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| |
Collapse
|
2
|
Eynard SE, Klopp C, Canale-Tabet K, Marande W, Vandecasteele C, Roques C, Donnadieu C, Boone Q, Servin B, Vignal A. The black honey bee genome: insights on specific structural elements and a first step towards pangenomes. Genet Sel Evol 2024; 56:51. [PMID: 38943059 PMCID: PMC11212449 DOI: 10.1186/s12711-024-00917-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Accepted: 06/04/2024] [Indexed: 07/01/2024] Open
Abstract
BACKGROUND The honey bee reference genome, HAv3.1, was produced from a commercial line sample that was thought to have a largely dominant Apis mellifera ligustica genetic background. Apis mellifera mellifera, often referred to as the black bee, has a separate evolutionary history and is the original type in western and northern Europe. Growing interest in this subspecies for conservation and non-professional apicultural practices, together with the necessity of deciphering genome backgrounds in hybrids, triggered the necessity for a specific genome assembly. Moreover, having several high-quality genomes is becoming key for taking structural variations into account in pangenome analyses. RESULTS Pacific Bioscience technology long reads were produced from a single haploid black bee drone. Scaffolding contigs into chromosomes was done using a high-density genetic map. This allowed for re-estimation of the recombination rate, which was over-estimated in some previous studies due to mis-assemblies, which resulted in spurious inversions in the older reference genomes. The sequence continuity obtained was very high and the only limit towards continuous chromosome-wide sequences seemed to be due to tandem repeat arrays that were usually longer than 10 kb and that belonged to two main families, the 371 and 91 bp repeats, causing problems in the assembly process due to high internal sequence similarity. Our assembly was used together with the reference genome to genotype two structural variants by a pangenome graph approach with Graphtyper2. Genotypes obtained were either correct or missing, when compared to an approach based on sequencing depth analysis, and genotyping rates were 89 and 76% for the two variants. CONCLUSIONS Our new assembly for the Apis mellifera mellifera honey bee subspecies demonstrates the utility of multiple high-quality genomes for the genotyping of structural variants, with a test case on two insertions and deletions. It will therefore be an invaluable resource for future studies, for instance by including structural variants in GWAS. Having used a single haploid drone for sequencing allowed a refined analysis of very large tandem repeat arrays, raising the question of their function in the genome. High quality genome assemblies for multiple subspecies such as presented here, are crucial for emerging projects using pangenomes.
Collapse
Affiliation(s)
- Sonia E Eynard
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | | | - Kamila Canale-Tabet
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | | | | | - Céline Roques
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | | | - Quentin Boone
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
- Sigenae, MIAT, INRAE, Castanet Tolosan, France
| | - Bertrand Servin
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France
| | - Alain Vignal
- GenPhySE, Université de Toulouse, INRAE, INPT, INP-ENVT, Castanet Tolosan, France.
| |
Collapse
|
3
|
Hayes V, Gong T, Jiang J, Bornman R, Gheybi K, Stricker P, Weischenfeldt J, Mutambirwa S. Rare pathogenic structural variants show potential to enhance prostate cancer germline testing for African men. RESEARCH SQUARE 2024:rs.3.rs-4531885. [PMID: 38947031 PMCID: PMC11213160 DOI: 10.21203/rs.3.rs-4531885/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Prostate cancer (PCa) is highly heritable, with men of African ancestry at greatest risk and associated lethality. Lack of representation in genomic data means germline testing guidelines exclude for African men. Established that structural variations (SVs) are major contributors to human disease and prostate tumourigenesis, their role is under-appreciated in familial and therapeutic testing. Utilising a clinico-methodologically matched African (n = 113) versus European (n = 57) deep-sequenced PCa resource, we interrogated 42,966 high-quality germline SVs using a best-fit pathogenicity prediction workflow. We identified 15 potentially pathogenic SVs representing 12.4% African and 7.0% European patients, of which 72% and 86% met germline testing standard-of-care recommendations, respectively. Notable African-specific loss-of-function gene candidates include DNA damage repair MLH1 and BARD1 and tumour suppressors FOXP1, WASF1 and RB1. Representing only a fraction of the vast African diaspora, this study raises considerations with respect to the contribution of kilo-to-mega-base rare variants to PCa pathogenicity and African associated disparity.
Collapse
Affiliation(s)
| | | | - Jue Jiang
- Garvan Institute of Medical Research
| | | | | | | | | | | |
Collapse
|
4
|
Skuladottir AT, Stefansdottir L, Halldorsson GH, Stefansson OA, Bjornsdottir A, Jonsson P, Palmadottir V, Thorgeirsson TE, Walters GB, Gisladottir RS, Bjornsdottir G, Jonsdottir GA, Sulem P, Gudbjartsson DF, Knowlton KU, Jones DA, Ottas A, Pedersen OB, Didriksen M, Brunak S, Banasik K, Hansen TF, Erikstrup C, Haavik J, Andreassen OA, Rye D, Igland J, Ostrowski SR, Milani LA, Nadauld LD, Stefansson H, Stefansson K. GWAS meta-analysis reveals key risk loci in essential tremor pathogenesis. Commun Biol 2024; 7:504. [PMID: 38671141 PMCID: PMC11053069 DOI: 10.1038/s42003-024-06207-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Essential tremor (ET) is a prevalent neurological disorder with a largely unknown underlying biology. In this genome-wide association study meta-analysis, comprising 16,480 ET cases and 1,936,173 controls from seven datasets, we identify 12 sequence variants at 11 loci. Evaluating mRNA expression, splicing, plasma protein levels, and coding effects, we highlight seven putative causal genes at these loci, including CA3 and CPLX1. CA3 encodes Carbonic Anhydrase III and carbonic anhydrase inhibitors have been shown to decrease tremors. CPLX1, encoding Complexin-1, regulates neurotransmitter release. Through gene-set enrichment analysis, we identify a significant association with specific cell types, including dopaminergic and GABAergic neurons, as well as biological processes like Rho GTPase signaling. Genetic correlation analyses reveals a positive association between ET and Parkinson's disease, depression, and anxiety-related phenotypes. This research uncovers risk loci, enhancing our knowledge of the complex genetics of this common but poorly understood disorder, and highlights CA3 and CPLX1 as potential therapeutic targets.
Collapse
Affiliation(s)
- Astros Th Skuladottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland.
| | | | | | | | | | - Palmi Jonsson
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
- Department of Geriatric Medicine, Landspitali University Hospital, Reykjavik, Iceland
| | - Vala Palmadottir
- Department of Internal Medicine, Landspitali University Hospital, Reykjavik, Iceland
| | | | | | - Rosa S Gisladottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Icelandic and Comparative Cultural Studies, University of Iceland, Reykjavik, Iceland
| | | | | | | | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Kirk U Knowlton
- Intermountain Medical Center, Intermountain Heart Institute, Salt Lake City, USA
| | - David A Jones
- Precision Genomics, Intermountain Healthcare, Saint George, Utah, UK
| | - Aigar Ottas
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Ole B Pedersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Maria Didriksen
- Department of Clinical Immunology, Copenhagen University Hospital, Righospitale, Copenhagen, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Karina Banasik
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Folkmann Hansen
- Danish Headache Center, Department of Neurology, Copenhagen University Hospital, Righospitalet-Glostrup, Copenhagen, Denmark
| | - Christian Erikstrup
- Department of Clinical Immunology, Aarhus University Hospital, Righospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, Aarhus University, Aarhus, Denmark
| | - Jan Haavik
- Department of Biomedicine, University of Bergen, Bergen, Norway
- Bergen Center of Brain Plasticity, Division of Psychiatry, Haukeland University Hospital, Bergen, Norway
| | - Ole A Andreassen
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
| | - David Rye
- Emory Department of Neurology, Wesley Woods Health Center, Atlanta, GA, USA
| | - Jannicke Igland
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
- Department of Health and Caring sciences, Western Norway University of Applied Sciences, Bergen, Norway
| | - Sisse Rye Ostrowski
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Immunology, Copenhagen University Hospital, Righospitale, Copenhagen, Denmark
| | - Lili A Milani
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Lincoln D Nadauld
- Precision Genomics, Intermountain Healthcare, Saint George, Utah, UK
- Stanford University, School of Medicine, Stanford, CA, USA
| | | | - Kari Stefansson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
5
|
Beaumont RN, Hawkes G, Gunning AC, Wright CF. Clustering of predicted loss-of-function variants in genes linked with monogenic disease can explain incomplete penetrance. Genome Med 2024; 16:64. [PMID: 38671509 PMCID: PMC11046769 DOI: 10.1186/s13073-024-01333-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 03/22/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND Genetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions. METHODS We used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants. RESULTS For most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10-6). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism. CONCLUSIONS Our results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.
Collapse
Affiliation(s)
- Robin N Beaumont
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, EX1 2LU, UK.
| | - Gareth Hawkes
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, EX1 2LU, UK
| | - Adam C Gunning
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, EX1 2LU, UK
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, EX2 5DW, UK
| | - Caroline F Wright
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, EX1 2LU, UK.
| |
Collapse
|
6
|
Du ZZ, He JB, Jiao WB. A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline. Genome Biol 2024; 25:91. [PMID: 38589937 PMCID: PMC11003132 DOI: 10.1186/s13059-024-03239-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 04/04/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.
Collapse
Affiliation(s)
- Ze-Zhen Du
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Jia-Bao He
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Wen-Biao Jiao
- National Key Laboratory for Germplasm Innovation & Utilization of Horticultural Crops, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
| |
Collapse
|
7
|
Kentistou KA, Lim BEM, Kaisinger LR, Steinthorsdottir V, Sharp LN, Patel KA, Tragante V, Hawkes G, Gardner EJ, Olafsdottir T, Wood AR, Zhao Y, Thorleifsson G, Day FR, Ozanne SE, Hattersley AT, O'Rahilly S, Stefansson K, Ong KK, Beaumont RN, Perry JRB, Freathy RM. Rare variant associations with birth weight identify genes involved in adipose tissue regulation, placental function and insulin-like growth factor signalling. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.03.24305248. [PMID: 38633783 PMCID: PMC11023655 DOI: 10.1101/2024.04.03.24305248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Investigating the genetic factors influencing human birth weight may lead to biological insights into fetal growth and long-term health. Genome-wide association studies of birth weight have highlighted associated variants in more than 200 regions of the genome, but the causal genes are mostly unknown. Rare genetic variants with robust evidence of association are more likely to point to causal genes, but to date, only a few rare variants are known to influence birth weight. We aimed to identify genes that harbour rare variants that impact birth weight when carried by either the fetus or the mother, by analysing whole exome sequence data in UK Biobank participants. We annotated rare (minor allele frequency <0.1%) protein-truncating or high impact missense variants on whole exome sequence data in up to 234,675 participants with data on their own birth weight (fetal variants), and up to 181,883 mothers who reported the birth weight of their first child (maternal variants). Variants within each gene were collapsed to perform gene burden tests and for each associated gene, we compared the observed fetal and maternal effects. We identified 8 genes with evidence of rare fetal variant effects on birth weight, of which 2 also showed maternal effects. One additional gene showed evidence of maternal effects only. We observed 10/11 directionally concordant associations in an independent sample of up to 45,622 individuals (sign test P=0.01). Of the genes identified, IGF1R and PAPPA2 (fetal and maternal-acting) have known roles in insulin-like growth factor bioavailability and signalling. PPARG, INHBE and ACVR1C (all fetal-acting) have known roles in adipose tissue regulation and rare variants in the latter two also showed associations with favourable adiposity patterns in adults. We highlight the dual role of PPARG in both adipocyte differentiation and placental angiogenesis. NOS3, NRK, and ADAMTS8 (fetal and maternal-acting) have been implicated in both placental function and hypertension. Analysis of rare coding variants has identified regulators of fetal adipose tissue and fetoplacental angiogenesis as determinants of birth weight, as well as further evidence for the role of insulin-like growth factors.
Collapse
Affiliation(s)
- Katherine A Kentistou
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
| | - Brandon E M Lim
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - Lena R Kaisinger
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
| | | | - Luke N Sharp
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - Kashyap A Patel
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | | | - Gareth Hawkes
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - Eugene J Gardner
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
| | | | - Andrew R Wood
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - Yajie Zhao
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
| | | | - Felix R Day
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
| | - Susan E Ozanne
- MRC Metabolic Diseases Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Andrew T Hattersley
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - Stephen O'Rahilly
- MRC Metabolic Diseases Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Kari Stefansson
- deCODE genetics/Amgen, Inc., 102 Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101 Reykjavik, Iceland
| | - Ken K Ong
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
- Department of Paediatrics, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Robin N Beaumont
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | - John R B Perry
- MRC Epidemiology Unit, Box 285 Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge CB2 0QQ, UK
- MRC Metabolic Diseases Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Rachel M Freathy
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| |
Collapse
|
8
|
Guan D, Sun S, Song L, Zhao P, Nie Y, Huang X, Zhou W, Yan L, Lei Y, Hu Y, Wei F. Taking a color photo: A homozygous 25-bp deletion in Bace2 may cause brown-and-white coat color in giant pandas. Proc Natl Acad Sci U S A 2024; 121:e2317430121. [PMID: 38437540 PMCID: PMC10945837 DOI: 10.1073/pnas.2317430121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 12/30/2023] [Indexed: 03/06/2024] Open
Abstract
Brown-and-white giant pandas (hereafter brown pandas) are distinct coat color mutants found exclusively in the Qinling Mountains, Shaanxi, China. However, its genetic mechanism has remained unclear since their discovery in 1985. Here, we identified the genetic basis for this coat color variation using a combination of field ecological data, population genomic data, and a CRISPR-Cas9 knockout mouse model. We de novo assembled a long-read-based giant panda genome and resequenced the genomes of 35 giant pandas, including two brown pandas and two family trios associated with a brown panda. We identified a homozygous 25-bp deletion in the first exon of Bace2, a gene encoding amyloid precursor protein cleaving enzyme, as the most likely genetic basis for brown-and-white coat color. This deletion was further validated using PCR and Sanger sequencing of another 192 black giant pandas and CRISPR-Cas9 edited knockout mice. Our investigation revealed that this mutation reduced the number and size of melanosomes of the hairs in knockout mice and possibly in the brown panda, further leading to the hypopigmentation. These findings provide unique insights into the genetic basis of coat color variation in wild animals.
Collapse
Affiliation(s)
- Dengfeng Guan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- Jiangxi Provincial Key Laboratory of Conservation Biology, Jiangxi Agricultural University, Nanchang330045, China
| | - Shuyan Sun
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Lingyun Song
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Pengpeng Zhao
- Shaanxi (Louguantai) Rescue and Breeding Center for Rare Wildlife, Xi’an710402, China
| | - Yonggang Nie
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Xin Huang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Wenliang Zhou
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou511458, China
| | - Li Yan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
| | - Yinghu Lei
- Shaanxi (Louguantai) Rescue and Breeding Center for Rare Wildlife, Xi’an710402, China
| | - Yibo Hu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Fuwen Wei
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing100101, China
- Jiangxi Provincial Key Laboratory of Conservation Biology, Jiangxi Agricultural University, Nanchang330045, China
- University of Chinese Academy of Sciences, Beijing100049, China
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou511458, China
| |
Collapse
|
9
|
Linderman MD, Wallace J, van der Heyde A, Wieman E, Brey D, Shi Y, Hansen P, Shamsi Z, Liu J, Gelb BD, Bashir A. NPSV-deep: a deep learning method for genotyping structural variants in short read genome sequencing data. Bioinformatics 2024; 40:btae129. [PMID: 38444093 PMCID: PMC10955255 DOI: 10.1093/bioinformatics/btae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/15/2024] [Accepted: 03/04/2024] [Indexed: 03/07/2024] Open
Abstract
MOTIVATION Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.
Collapse
Affiliation(s)
- Michael D Linderman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Jacob Wallace
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Alderik van der Heyde
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Eliza Wieman
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Daniel Brey
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Yiran Shi
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | - Peter Hansen
- Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States
| | | | | | - Bruce D Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
| | - Ali Bashir
- Google, Mountain View, CA 94043, United States
| |
Collapse
|
10
|
Groza C, Schwendinger-Schreck C, Cheung WA, Farrow EG, Thiffault I, Lake J, Rizzo WB, Evrony G, Curran T, Bourque G, Pastinen T. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat Commun 2024; 15:657. [PMID: 38253606 PMCID: PMC10803329 DOI: 10.1038/s41467-024-44980-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/10/2024] [Indexed: 01/24/2024] Open
Abstract
Rare DNA alterations that cause heritable diseases are only partially resolvable by clinical next-generation sequencing due to the difficulty of detecting structural variation (SV) in all genomic contexts. Long-read, high fidelity genome sequencing (HiFi-GS) detects SVs with increased sensitivity and enables assembling personal and graph genomes. We leverage standard reference genomes, public assemblies (n = 94) and a large collection of HiFi-GS data from a rare disease program (Genomic Answers for Kids, GA4K, n = 574 assemblies) to build a graph genome representing a unified SV callset in GA4K, identify common variation and prioritize SVs that are more likely to cause genetic disease (MAF < 0.01). Using graphs, we obtain a higher level of reproducibility than the standard reference approach. We observe over 200,000 SV alleles unique to GA4K, including nearly 1000 rare variants that impact coding sequence. With improved specificity for rare SVs, we isolate 30 candidate SVs in phenotypically prioritized genes, including known disease SVs. We isolate a novel diagnostic SV in KMT2E, demonstrating use of personal assemblies coupled with pangenome graphs for rare disease genomics. The community may interrogate our pangenome with additional assemblies to discover new SVs within the allele frequency spectrum relevant to genetic diseases.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | | | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Emily G Farrow
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | | | - William B Rizzo
- Child Health Research Institute, Department of Pediatrics, Nebraska Medical Center, Omaha, NE, USA
| | - Gilad Evrony
- Center for Human Genetics and Genomics, Department of Pediatrics, Neuroscience & Physiology, New York University Grossman School of Medicine, New York, NY, USA
| | - Tom Curran
- Children's Mercy Research Institute, Kansas City, MO, USA
| | - Guillaume Bourque
- Canadian Center for Computational Genomics, McGill University, Montréal, QC, Canada.
- Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada.
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.
| |
Collapse
|
11
|
Shen Y, Yu L, Qiu Y, Zhang T, Kingsford C. Improving Hi-C contact matrices using genome graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566275. [PMID: 37986943 PMCID: PMC10659349 DOI: 10.1101/2023.11.08.566275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Three-dimensional chromosome structure plays an important role in fundamental genomic functions. Hi-C, a high-throughput, sequencing-based technique, has drastically expanded our comprehension of 3D chromosome structures. The first step of Hi-C analysis pipeline involves mapping sequencing reads from Hi-C to linear reference genomes. However, the linear reference genome does not incorporate genetic variation information, which can lead to incorrect read alignments, especially when analyzing samples with substantial genomic differences from the reference such as cancer samples. Using genome graphs as the reference facilitates more accurate mapping of reads, however, new algorithms are required for inferring linear genomes from Hi-C reads mapped on genome graphs and constructing corresponding Hi-C contact matrices, which is a prerequisite for the subsequent steps of the Hi-C analysis such as identifying topologically associated domains and calling chromatin loops. We introduce the problem of genome sequence inference from Hi-C data mediated by genome graphs. We formalize this problem, show the hardness of solving this problem, and introduce a novel heuristic algorithm specifically tailored to this problem. We provide a theoretical analysis to evaluate the efficacy of our algorithm. Finally, our empirical experiments indicate that the linear genomes inferred from our method lead to the creation of improved Hi-C contact matrices. These enhanced matrices show a reduction in erroneous patterns caused by structural variations and are more effective in accurately capturing the structures of topologically associated domains.
Collapse
Affiliation(s)
- Yihang Shen
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
| | - Lingge Yu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
| | - Yutong Qiu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
| | - Tianyu Zhang
- Department of Statistics and Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA
| |
Collapse
|
12
|
Zhang Z, van Treuren R, Yang T, Hu Y, Zhou W, Liu H, Wei T. A comprehensive lettuce variation map reveals the impact of structural variations in agronomic traits. BMC Genomics 2023; 24:659. [PMID: 37919641 PMCID: PMC10621239 DOI: 10.1186/s12864-023-09739-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 10/12/2023] [Indexed: 11/04/2023] Open
Abstract
BACKGROUND As an important vegetable crop, cultivated lettuce is grown worldwide and a great variety of agronomic traits have been preserved within germplasm collections. The mechanisms underlying these phenotypic variations remain to be elucidated in association with sequence variations. Compared with single nucleotide polymorphisms, structural variations (SVs) that have more impacts on gene functions remain largely uncharacterized in the lettuce genome. RESULTS Here, we produced a comprehensive SV set for 333 wild and cultivated lettuce accessions. Comparison of SV frequencies showed that the SVs prevalent in L. sativa affected the genes enriched in carbohydrate derivative catabolic and secondary metabolic processes. Genome-wide association analysis of seven agronomic traits uncovered potentially causal SVs associated with seed coat color and leaf anthocyanin content. CONCLUSION Our work characterized a great abundance of SVs in the lettuce genome, and provides a valuable genomic resource for future lettuce breeding.
Collapse
Affiliation(s)
- Zhaowu Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, 518083, China
| | - Rob van Treuren
- Centre for Genetic Resources, the Netherlands, Wageningen University & Research, Wageningen, the Netherlands
| | - Ting Yang
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, 518083, China
| | - Yulan Hu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, 518083, China
| | - Wenhui Zhou
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, 518083, China
| | - Huan Liu
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, 518083, China.
| | - Tong Wei
- State Key Laboratory of Agricultural Genomics, BGI Research, Shenzhen, 518083, China.
| |
Collapse
|
13
|
Lee WP, Wang H, Dombroski B, Cheng PL, Tucci A, Si YQ, Farrell J, Tzeng JY, Leung YY, Malamon J, Wang LS, Vardarajan B, Farrer L, Schellenberg G. Structural Variation Detection and Association Analysis of Whole-Genome-Sequence Data from 16,905 Alzheimer's Diseases Sequencing Project Subjects. RESEARCH SQUARE 2023:rs.3.rs-3353179. [PMID: 37886469 PMCID: PMC10602095 DOI: 10.21203/rs.3.rs-3353179/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Structural variations (SVs) are important contributors to the genetics of human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. We analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (N = 16,905) and identified 400,234 (168,223 high-quality) SVs. Laboratory validation yielded a sensitivity of 82% (85% for high-quality). We found a significant burden of deletions and duplications in AD cases, particularly for singletons and homozygous events. On AD genes, we observed the ultra-rare SVs associated with the disease, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1. Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, exemplified by a 5k deletion in complete LD with rs143080277 in NCK2. We also identified 16 SVs associated with AD and 13 SVs linked to AD-related pathological/cognitive endophenotypes. This study highlights the pivotal role of SVs in shaping our understanding of AD genetics.
Collapse
|
14
|
Xie S, Isaacs K, Becker G, Murdoch BM. A computational framework for improving genetic variants identification from 5,061 sheep sequencing data. J Anim Sci Biotechnol 2023; 14:127. [PMID: 37779189 PMCID: PMC10544426 DOI: 10.1186/s40104-023-00923-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 08/01/2023] [Indexed: 10/03/2023] Open
Abstract
BACKGROUND Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation. Joint calling is routinely used to combine identified variants across multiple related samples. However, the improvement of variants identification using the mutual support information from multiple samples remains quite limited for population-scale genotyping. RESULTS In this study, we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples' data. The variants were accurately identified from multiple samples by using four steps: (1) Probabilities of variants from two widely used algorithms, GATK and Freebayes, were calculated by Poisson model incorporating base sequencing error potential; (2) The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification (rHID) variants database; (3) The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate (FDR) using rHID database; (4) To avoid the elimination of potentially true variants from rHID database, the variants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification variants. The results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32% compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number (GPC5), scrapie pathology (PAPSS2), seasonal reproduction and litter size (GRM1), coat color (RAB27A), and lentivirus susceptibility (TMEM154). CONCLUSION The new method used the computational strategy to reduce the number of false positives, and simultaneously improve the identification of genetic variants. This strategy did not incur any extra cost by using any additional samples or sequencing data information and advantageously identified rare variants which can be important for practical applications of animal breeding.
Collapse
Affiliation(s)
- Shangqian Xie
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA
| | | | - Gabrielle Becker
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA.
| |
Collapse
|
15
|
Hytönen MK, Viitanen S, Hundi S, Donner J, Lohi H, Kaukonen M. A frameshift deletion in F8 associated with hemophilia A in Labrador Retriever dogs. Anim Genet 2023; 54:606-612. [PMID: 37438956 DOI: 10.1111/age.13345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/27/2023] [Accepted: 06/27/2023] [Indexed: 07/14/2023]
Abstract
Hemophilia A is the most common inherited coagulation factor disorder in dogs. It manifests as excessive bleeding resulting from pathogenic variants in the X-chromosomal F8 gene encoding coagulation factor VIII (FVIII) protein. In this study, we performed careful clinical phenotyping to confirm hemophilia A in two distinct Labrador Retriever (LR) pedigrees. Whole-genome sequencing on an affected dog from litter 1 identified a case-specific frameshift deletion variant in F8 predicted to cause a premature stop codon (c.2923_2924del, p.(E975Kfs*8)). This variant was hemizygous in all the affected males from litter 1 (n = 3), while all the unaffected LRs in the pedigree were heterozygous or wild-type (n = 22). Additionally, screened samples from 199 LRs were all found to be wild-type. As a result of this study, a gene test can now be developed to screen dogs before breeding to prevent further cases. However, it is important to note that the affected LR with decreased FVIII activity from litter 2 was wild-type for the identified deletion variant, and no segregating F8 variants were detected when this dog's DNA sample was whole-genome sequenced. Thus, the cause of decreased FVIII activity in this dog remains to be unraveled in future studies.
Collapse
Affiliation(s)
- Marjo K Hytönen
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Sanna Viitanen
- Department of Equine and Small Animal Medicine, University of Helsinki, Helsinki, Finland
| | - Sruthi Hundi
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Jonas Donner
- Wisdom Panel Research Team, Wisdom Panel, Kinship, Helsinki, Finland
| | - Hannes Lohi
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| | - Maria Kaukonen
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
| |
Collapse
|
16
|
Liu Q, Xie B, Gao Y, Xu S, Lu Y. A protocol for applying low-coverage whole-genome sequencing data in structural variation studies. STAR Protoc 2023; 4:102433. [PMID: 37432854 PMCID: PMC10362160 DOI: 10.1016/j.xpro.2023.102433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 05/23/2023] [Accepted: 06/12/2023] [Indexed: 07/13/2023] Open
Abstract
Structural variations (SVs) have a great impact on various biological processes and influence physical traits in many species. Here, we present a protocol for applying the low-coverage next-generation sequencing data of Rhipicephalus microplus to detect high-differentiated SVs accurately. We also outline its use to investigate population/species-specific genetic structures, local adaptation, and transcriptional function. We describe steps for constructing variation maps and SV annotation. We then detail population genetic analysis and differential gene expression analysis. For complete details on the usage and execution of this protocol, please refer to Liu et al. (2023).
Collapse
Affiliation(s)
- Qi Liu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China
| | - Bo Xie
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yang Gao
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China.
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China.
| |
Collapse
|
17
|
Wilcox N, Dumont M, González-Neira A, Carvalho S, Joly Beauparlant C, Crotti M, Luccarini C, Soucy P, Dubois S, Nuñez-Torres R, Pita G, Gardner EJ, Dennis J, Alonso MR, Álvarez N, Baynes C, Collin-Deschesnes AC, Desjardins S, Becher H, Behrens S, Bolla MK, Castelao JE, Chang-Claude J, Cornelissen S, Dörk T, Engel C, Gago-Dominguez M, Guénel P, Hadjisavvas A, Hahnen E, Hartman M, Herráez B, Jung A, Keeman R, Kiechle M, Li J, Loizidou MA, Lush M, Michailidou K, Panayiotidis MI, Sim X, Teo SH, Tyrer JP, van der Kolk LE, Wahlström C, Wang Q, Perry JRB, Benitez J, Schmidt MK, Schmutzler RK, Pharoah PDP, Droit A, Dunning AM, Kvist A, Devilee P, Easton DF, Simard J. Exome sequencing identifies breast cancer susceptibility genes and defines the contribution of coding variants to breast cancer risk. Nat Genet 2023; 55:1435-1439. [PMID: 37592023 PMCID: PMC10484782 DOI: 10.1038/s41588-023-01466-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 07/05/2023] [Indexed: 08/19/2023]
Abstract
Linkage and candidate gene studies have identified several breast cancer susceptibility genes, but the overall contribution of coding variation to breast cancer is unclear. To evaluate the role of rare coding variants more comprehensively, we performed a meta-analysis across three large whole-exome sequencing datasets, containing 26,368 female cases and 217,673 female controls. Burden tests were performed for protein-truncating and rare missense variants in 15,616 and 18,601 genes, respectively. Associations between protein-truncating variants and breast cancer were identified for the following six genes at exome-wide significance (P < 2.5 × 10-6): the five known susceptibility genes ATM, BRCA1, BRCA2, CHEK2 and PALB2, together with MAP3K1. Associations were also observed for LZTR1, ATR and BARD1 with P < 1 × 10-4. Associations between predicted deleterious rare missense or protein-truncating variants and breast cancer were additionally identified for CDKN2A at exome-wide significance. The overall contribution of coding variants in genes beyond the previously known genes is estimated to be small.
Collapse
Affiliation(s)
- Naomi Wilcox
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Martine Dumont
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Anna González-Neira
- Human Genotyping Unit-CeGen, Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Sara Carvalho
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Charles Joly Beauparlant
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Marco Crotti
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Craig Luccarini
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Penny Soucy
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Stéphane Dubois
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Rocio Nuñez-Torres
- Human Genotyping Unit-CeGen, Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Guillermo Pita
- Human Genotyping Unit-CeGen, Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Eugene J Gardner
- MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
| | - Joe Dennis
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - M Rosario Alonso
- Human Genotyping Unit-CeGen, Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Nuria Álvarez
- Human Genotyping Unit-CeGen, Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Caroline Baynes
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Annie Claude Collin-Deschesnes
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Sylvie Desjardins
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| | - Heiko Becher
- Institute of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sabine Behrens
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Manjeet K Bolla
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Jose E Castelao
- Oncology and Genetics Unit, Instituto de Investigación Sanitaria Galicia Sur (IISGS), Xerencia de Xestion Integrada de Vigo-SERGAS, Vigo, Spain
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Cancer Epidemiology Group, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sten Cornelissen
- Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Thilo Dörk
- Gynaecology Research Unit, Hannover Medical School, Hannover, Germany
| | - Christoph Engel
- Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany
- LIFE-Leipzig Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany
| | - Manuela Gago-Dominguez
- Cancer Genetics and Epidemiology Group, Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS) Foundation, Complejo Hospitalario Universitario de Santiago, SERGAS, Santiago de Compostela, Spain
| | - Pascal Guénel
- Team 'Exposome and Heredity,' CESP, Gustave Roussy, INSERM, University Paris-Saclay, UVSQ, Villejuif, France
| | - Andreas Hadjisavvas
- Department of Cancer Genetics, Therapeutics and Ultrastructural Pathology, The Cyprus Institute of Neurology & Genetics, Nicosia, Cyprus
| | - Eric Hahnen
- Center for Familial Breast and Ovarian Cancer, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Integrated Oncology (CIO), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Mikael Hartman
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore City, Singapore
- Department of Surgery, National University Health System, Singapore City, Singapore
- Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore City, Singapore
| | - Belén Herráez
- Human Genotyping Unit-CeGen, Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Audrey Jung
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Renske Keeman
- Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Marion Kiechle
- Division of Gynaecology and Obstetrics, Klinikum rechts der Isar der Technischen Universität München, Munich, Germany
| | - Jingmei Li
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore City, Singapore.
| | - Maria A Loizidou
- Department of Cancer Genetics, Therapeutics and Ultrastructural Pathology, The Cyprus Institute of Neurology & Genetics, Nicosia, Cyprus
| | - Michael Lush
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Kyriaki Michailidou
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Biostatistics Unit, The Cyprus Institute of Neurology & Genetics, Nicosia, Cyprus
| | - Mihalis I Panayiotidis
- Department of Cancer Genetics, Therapeutics and Ultrastructural Pathology, The Cyprus Institute of Neurology & Genetics, Nicosia, Cyprus
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore City, Singapore
| | - Soo Hwang Teo
- Breast Cancer Research Programme, Cancer Research Malaysia, Subang Jaya, Malaysia
- Department of Surgery, Faculty of Medicine, University of Malaya, UM Cancer Research Institute, Kuala Lumpur, Malaysia
| | - Jonathan P Tyrer
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Lizet E van der Kolk
- Family Cancer Clinic, The Netherlands Cancer Institute-Antoni van Leeuwenhoek hospital, Amsterdam, the Netherlands
| | - Cecilia Wahlström
- Division of Oncology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Qin Wang
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - John R B Perry
- MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
- Metabolic Research Laboratory, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
| | - Javier Benitez
- Human Genetics Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Marjanka K Schmidt
- Division of Molecular Pathology, The Netherlands Cancer Institute, Amsterdam, the Netherlands
- Division of Psychosocial Research and Epidemiology, The Netherlands Cancer Institute-Antoni van Leeuwenhoek hospital, Amsterdam, the Netherlands
| | - Rita K Schmutzler
- Center for Familial Breast and Ovarian Cancer, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Integrated Oncology (CIO), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Paul D P Pharoah
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Arnaud Droit
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
- Département de Médecine Moléculaire, Faculté de Médecine, Centre Hospitalier Universitaire de Québec Research Center, Laval University, Québec City, Quebec, Canada
| | - Alison M Dunning
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Anders Kvist
- Division of Oncology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Peter Devilee
- Department of Pathology, Leiden University Medical Center, Leiden, the Netherlands
- Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Douglas F Easton
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Cambridge, UK.
| | - Jacques Simard
- Genomics Center, Centre Hospitalier Universitaire de Québec-Université Laval Research Center, Québec City, Quebec, Canada
| |
Collapse
|
18
|
Antinucci M, Comas D, Calafell F. Population history modulates the fitness effects of Copy Number Variation in the Roma. Hum Genet 2023; 142:1327-1343. [PMID: 37311904 PMCID: PMC10449987 DOI: 10.1007/s00439-023-02579-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 06/02/2023] [Indexed: 06/15/2023]
Abstract
We provide the first whole genome Copy Number Variant (CNV) study addressing Roma, along with reference populations from South Asia, the Middle East and Europe. Using CNV calling software for short-read sequence data, we identified 3171 deletions and 489 duplications. Taking into account the known population history of the Roma, as inferred from whole genome nucleotide variation, we could discern how this history has shaped CNV variation. As expected, patterns of deletion variation, but not duplication, in the Roma followed those obtained from single nucleotide polymorphisms (SNPs). Reduced effective population size resulting in slightly relaxed natural selection may explain our observation of an increase in intronic (but not exonic) deletions within Loss of Function (LoF)-intolerant genes. Over-representation analysis for LoF-intolerant gene sets hosting intronic deletions highlights a substantial accumulation of shared biological processes in Roma, intriguingly related to signaling, nervous system and development features, which may be related to the known profile of private disease in the population. Finally, we show the link between deletions and known trait-related SNPs reported in the genome-wide association study (GWAS) catalog, which exhibited even frequency distributions among the studied populations. This suggests that, in general human populations, the strong association between deletions and SNPs associated to biomedical conditions and traits could be widespread across continental populations, reflecting a common background of potentially disease/trait-related CNVs.
Collapse
Affiliation(s)
- Marco Antinucci
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - David Comas
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Francesc Calafell
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
19
|
Meadows JRS, Kidd JM, Wang GD, Parker HG, Schall PZ, Bianchi M, Christmas MJ, Bougiouri K, Buckley RM, Hitte C, Nguyen AK, Wang C, Jagannathan V, Niskanen JE, Frantz LAF, Arumilli M, Hundi S, Lindblad-Toh K, Ginja C, Agustina KK, André C, Boyko AR, Davis BW, Drögemüller M, Feng XY, Gkagkavouzis K, Iliopoulos G, Harris AC, Hytönen MK, Kalthoff DC, Liu YH, Lymberakis P, Poulakakis N, Pires AE, Racimo F, Ramos-Almodovar F, Savolainen P, Venetsani S, Tammen I, Triantafyllidis A, vonHoldt B, Wayne RK, Larson G, Nicholas FW, Lohi H, Leeb T, Zhang YP, Ostrander EA. Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture. Genome Biol 2023; 24:187. [PMID: 37582787 PMCID: PMC10426128 DOI: 10.1186/s13059-023-03023-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 07/25/2023] [Indexed: 08/17/2023] Open
Abstract
BACKGROUND The international Dog10K project aims to sequence and analyze several thousand canine genomes. Incorporating 20 × data from 1987 individuals, including 1611 dogs (321 breeds), 309 village dogs, 63 wolves, and four coyotes, we identify genomic variation across the canid family, setting the stage for detailed studies of domestication, behavior, morphology, disease susceptibility, and genome architecture and function. RESULTS We report the analysis of > 48 M single-nucleotide, indel, and structural variants spanning the autosomes, X chromosome, and mitochondria. We discover more than 75% of variation for 239 sampled breeds. Allele sharing analysis indicates that 94.9% of breeds form monophyletic clusters and 25 major clades. German Shepherd Dogs and related breeds show the highest allele sharing with independent breeds from multiple clades. On average, each breed dog differs from the UU_Cfam_GSD_1.0 reference at 26,960 deletions and 14,034 insertions greater than 50 bp, with wolves having 14% more variants. Discovered variants include retrogene insertions from 926 parent genes. To aid functional prioritization, single-nucleotide variants were annotated with SnpEff and Zoonomia phyloP constraint scores. Constrained positions were negatively correlated with allele frequency. Finally, the utility of the Dog10K data as an imputation reference panel is assessed, generating high-confidence calls across varied genotyping platform densities including for breeds not included in the Dog10K collection. CONCLUSIONS We have developed a dense dataset of 1987 sequenced canids that reveals patterns of allele sharing, identifies likely functional variants, informs breed structure, and enables accurate imputation. Dog10K data are publicly available.
Collapse
Affiliation(s)
- Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden.
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA.
| | - Guo-Dong Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Heidi G Parker
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA
| | - Peter Z Schall
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA
| | - Matteo Bianchi
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
| | - Matthew J Christmas
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
| | - Katia Bougiouri
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| | - Reuben M Buckley
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA
| | - Christophe Hitte
- University of Rennes, CNRS, Institute Genetics and Development Rennes - UMR6290, 35000, Rennes, France
| | - Anthony K Nguyen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA
| | - Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
| | - Vidhya Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - Julia E Niskanen
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Laurent A F Frantz
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E14NS, UK and Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, D-80539, Munich, Germany
| | - Meharji Arumilli
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Sruthi Hundi
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Catarina Ginja
- BIOPOLIS-CIBIO-InBIO-Centro de Investigação Em Biodiversidade E Recursos Genéticos - ArchGen Group, Universidade Do Porto, 4485-661, Vairão, Portugal
| | | | - Catherine André
- University of Rennes, CNRS, Institute Genetics and Development Rennes - UMR6290, 35000, Rennes, France
| | - Adam R Boyko
- Department of Biomedical Sciences, Cornell University, 930 Campus Road, Ithaca, NY, 14853, USA
| | - Brian W Davis
- Department of Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Michaela Drögemüller
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - Xin-Yao Feng
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Konstantinos Gkagkavouzis
- Department of Genetics, School of Biology, ), Aristotle University of Thessaloniki, Thessaloniki, Macedonia 54124, Greece and Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation (CIRI-AUTH, Balkan Center, Thessaloniki, Greece
| | - Giorgos Iliopoulos
- NGO "Callisto", Wildlife and Nature Conservation Society, 54621, Thessaloniki, Greece
| | - Alexander C Harris
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA
| | - Marjo K Hytönen
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Daniela C Kalthoff
- NGO "Callisto", Wildlife and Nature Conservation Society, 54621, Thessaloniki, Greece
| | - Yan-Hu Liu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Petros Lymberakis
- Natural History Museum of Crete & Department of Biology, University of Crete, 71202, Irakleio, Greece
- Biology Department, School of Sciences and Engineering, University of Crete, Heraklion, Greece
- Palaeogenomics and Evolutionary Genetics Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
| | - Nikolaos Poulakakis
- Natural History Museum of Crete & Department of Biology, University of Crete, 71202, Irakleio, Greece
- Biology Department, School of Sciences and Engineering, University of Crete, Heraklion, Greece
- Palaeogenomics and Evolutionary Genetics Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
| | - Ana Elisabete Pires
- BIOPOLIS-CIBIO-InBIO-Centro de Investigação Em Biodiversidade E Recursos Genéticos - ArchGen Group, Universidade Do Porto, 4485-661, Vairão, Portugal
| | - Fernando Racimo
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| | | | - Peter Savolainen
- Department of Gene Technology, Science for Life Laboratory, KTH - Royal Institute of Technology, 17121, Solna, Sweden
| | - Semina Venetsani
- Department of Genetics, School of Biology, Aristotle University of Thessaloniki, 54124, Thessaloniki, Macedonia, Greece
| | - Imke Tammen
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2570, Australia
| | - Alexandros Triantafyllidis
- Department of Genetics, School of Biology, ), Aristotle University of Thessaloniki, Thessaloniki, Macedonia 54124, Greece and Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation (CIRI-AUTH, Balkan Center, Thessaloniki, Greece
| | - Bridgett vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-7246, USA
| | - Greger Larson
- Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, OX1 3TG, UK
| | - Frank W Nicholas
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2570, Australia
| | - Hannes Lohi
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - Ya-Ping Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA.
| |
Collapse
|
20
|
Heinonen T, Flegel T, Müller H, Kehl A, Hundi S, Matiasek K, Fischer A, Donner J, Forman OP, Lohi H, Hytönen MK. A loss-of-function variant in canine GLRA1 associates with a neurological disorder resembling human hyperekplexia. Hum Genet 2023; 142:1221-1230. [PMID: 37222814 PMCID: PMC10449970 DOI: 10.1007/s00439-023-02571-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 05/08/2023] [Indexed: 05/25/2023]
Abstract
Hereditary hyperekplexia is a rare neuronal disorder characterized by an exaggerated startle response to sudden tactile or acoustic stimuli. In this study, we present a Miniature Australian Shepherd family showing clinical signs, which have genetic and phenotypic similarities with human hereditary hyperekplexia: episodes of muscle stiffness that could occasionally be triggered by acoustic stimuli. Whole genome sequence data analysis of two affected dogs revealed a 36-bp deletion spanning the exon-intron boundary in the glycine receptor alpha 1 (GLRA1) gene. Further validation in pedigree samples and an additional cohort of 127 Miniature Australian Shepherds, 45 Miniature American Shepherds and 74 Australian Shepherds demonstrated complete segregation of the variant with the disease, according to an autosomal recessive inheritance pattern. The protein encoded by GLRA1 is a subunit of the glycine receptor, which mediates postsynaptic inhibition in the brain stem and spinal cord. The canine GLRA1 deletion is located in the signal peptide and is predicted to cause exon skipping and subsequent premature stop codon resulting in a significant defect in glycine signaling. Variants in GLRA1 are known to cause hereditary hyperekplexia in humans; however, this is the first study to associate a variant in canine GLRA1 with the disorder, establishing a spontaneous large animal disease model for the human condition.
Collapse
Affiliation(s)
- Tiina Heinonen
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Thomas Flegel
- Department of Small Animals, Leipzig University, Leipzig, Germany
| | - Hanna Müller
- Tieraerztliches Fachzentrum Muehlhausen Dr. Ortmann & Dr. Stief, Muehlhausen/Thueringen, Germany
| | | | - Sruthi Hundi
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland
- Folkhälsan Research Center, Helsinki, Finland
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Kaspar Matiasek
- Section of Clinical and Comparative Neuropathology, Institute of Veterinary Pathology, Centre for Clinical Veterinary Medicine, LMU Munich, Munich, Germany
| | - Andrea Fischer
- Clinic of Small Animal Medicine, Centre for Clinical Veterinary Medicine, LMU Munich, Munich, Germany
| | - Jonas Donner
- Wisdom Panel Research Team, Wisdom Panel, Kinship, Helsinki, Finland
| | - Oliver P Forman
- Wisdom Panel Research Team, Wisdom Panel, Kinship, Leicestershire, UK
| | - Hannes Lohi
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland.
- Folkhälsan Research Center, Helsinki, Finland.
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.
| | - Marjo K Hytönen
- Department of Medical and Clinical Genetics, University of Helsinki, Helsinki, Finland.
- Folkhälsan Research Center, Helsinki, Finland.
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
21
|
Saei H, Morinière V, Heidet L, Gribouval O, Lebbah S, Tores F, Mautret-Godefroy M, Knebelmann B, Burtey S, Vuiblet V, Antignac C, Nitschké P, Dorval G. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data in autosomal dominant tubulointerstitial kidney disease. iScience 2023; 26:107171. [PMID: 37456840 PMCID: PMC10338300 DOI: 10.1016/j.isci.2023.107171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/06/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023] Open
Abstract
The human genome comprises approximately 3% of tandem repeats with variable length (VNTR), a few of which have been linked to human rare diseases. Autosomal dominant tubulointerstitial kidney disease-MUC1 (ADTKD-MUC1) is caused by specific frameshift variants in the coding VNTR of the MUC1 gene. Calling variants from VNTR using short-read sequencing (SRS) is challenging due to poor read mappability. We developed a computational pipeline, VNtyper, for reliable detection of MUC1 VNTR pathogenic variants and demonstrated its clinical utility in two distinct cohorts: (1) a historical cohort including 108 families with ADTKD and (2) a replication naive cohort comprising 2,910 patients previously tested on a panel of genes involved in monogenic renal diseases. In the historical cohort all cases known to carry pathogenic MUC1 variants were re-identified, and a new 25bp-frameshift insertion in an additional mislaid family was detected. In the replication cohort, we discovered and validated 30 new patients.
Collapse
Affiliation(s)
- Hassan Saei
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Vincent Morinière
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Laurence Heidet
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
- Service de Néphrologie Pédiatrique, Centre de Référence MARHEA, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Olivier Gribouval
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Said Lebbah
- Département de Santé Publique, Unité de Recherche Clinique, Hôpital Pitié-Salpêtrière, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Frederic Tores
- Plateforme Bio-informatique, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Manon Mautret-Godefroy
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Bertrand Knebelmann
- Service de Néphrologie, Centre de Référence MARHEA, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Stéphane Burtey
- Inserm, C2VN, INRAE, C2VN, Aix-Marseille Université, Marseille, France
- Centre de Néphrologie et Transplantation Rénale, AP-HM Hôpital de la Conception, Marseille, France
| | - Vincent Vuiblet
- Service de Néphrologie, CHU de Reims, Reims, France
- Service de Pathologie, CHU De Reims, Reims, France
- Institut d'Intelligence Artificielle en Santé, Université de Reims Champagne-Ardenne et CHU de Reims, Reims, France
| | - Corinne Antignac
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| | - Patrick Nitschké
- Plateforme Bio-informatique, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
| | - Guillaume Dorval
- Laboratoire des Maladies Rénales Héréditaires, Inserm UMR 1163, Institut Imagine, Université Paris Cité, Paris, France
- Service de Médecine Génomique des Maladies Rares, Hôpital Necker-Enfants Malades, Assistance publique, Hôpitaux de Paris (AP-HP), Paris, France
| |
Collapse
|
22
|
Houwaart T, Scholz S, Pollock NR, Palmer WH, Kichula KM, Strelow D, Le DB, Belick D, Hülse L, Lautwein T, Wachtmeister T, Wollenweber TE, Henrich B, Köhrer K, Parham P, Guethlein LA, Norman PJ, Dilthey AT. Complete sequences of six major histocompatibility complex haplotypes, including all the major MHC class II structures. HLA 2023; 102:28-43. [PMID: 36932816 PMCID: PMC10986641 DOI: 10.1111/tan.15020] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 02/10/2023] [Accepted: 02/24/2023] [Indexed: 03/19/2023]
Abstract
Accurate and comprehensive immunogenetic reference panels are key to the successful implementation of population-scale immunogenomics. The 5Mbp Major Histocompatibility Complex (MHC) is the most polymorphic region of the human genome and associated with multiple immune-mediated diseases, transplant matching and therapy responses. Analysis of MHC genetic variation is severely complicated by complex patterns of sequence variation, linkage disequilibrium and a lack of fully resolved MHC reference haplotypes, increasing the risk of spurious findings on analyzing this medically important region. Integrating Illumina, ultra-long Nanopore, and PacBio HiFi sequencing as well as bespoke bioinformatics, we completed five of the alternative MHC reference haplotypes of the current (GRCh38/hg38) build of the human reference genome and added one other. The six assembled MHC haplotypes encompass the DR1 and DR4 haplotype structures in addition to the previously completed DR2 and DR3, as well as six distinct classes of the structurally variable C4 region. Analysis of the assembled haplotypes showed that MHC class II sequence structures, including repeat element positions, are generally conserved within the DR haplotype supergroups, and that sequence diversity peaks in three regions around HLA-A, HLA-B+C, and the HLA class II genes. Demonstrating the potential for improved short-read analysis, the number of proper read pairs recruited to the MHC was found to be increased by 0.06%-0.49% in a 1000 Genomes Project read remapping experiment with seven diverse samples. Furthermore, the assembled haplotypes can serve as references for the community and provide the basis of a structurally accurate genotyping graph of the complete MHC region.
Collapse
Affiliation(s)
- Torsten Houwaart
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Nicholas R. Pollock
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - William H. Palmer
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Katherine M. Kichula
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Daniel Strelow
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Duyen B. Le
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Dana Belick
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Tobias Lautwein
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Thorsten Wachtmeister
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Tassilo E. Wollenweber
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Birgit Henrich
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Karl Köhrer
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Peter Parham
- Department of Structural Biology, and Department of Microbiology and ImmunologyStanford UniversityStanfordCaliforniaUSA
| | - Lisbeth A. Guethlein
- Department of Structural Biology, and Department of Microbiology and ImmunologyStanford UniversityStanfordCaliforniaUSA
| | - Paul J. Norman
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Alexander T. Dilthey
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| |
Collapse
|
23
|
Romain S, Lemaitre C. SVJedi-graph: improving the genotyping of close and overlapping structural variants with long reads using a variation graph. Bioinformatics 2023; 39:i270-i278. [PMID: 37387169 DOI: 10.1093/bioinformatics/btad237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Structural variation (SV) is a class of genetic diversity whose importance is increasingly revealed by genome resequencing, especially with long-read technologies. One crucial problem when analyzing and comparing SVs in several individuals is their accurate genotyping, that is determining whether a described SV is present or absent in one sequenced individual, and if present, in how many copies. There are only a few methods dedicated to SV genotyping with long-read data, and all either suffer of a bias toward the reference allele by not representing equally all alleles, or have difficulties genotyping close or overlapping SVs due to a linear representation of the alleles. RESULTS We present SVJedi-graph, a novel method for SV genotyping that relies on a variation graph to represent in a single data structure all alleles of a set of SVs. The long reads are mapped on the variation graph and the resulting alignments that cover allele-specific edges in the graph are used to estimate the most likely genotype for each SV. Running SVJedi-graph on simulated sets of close and overlapping deletions showed that this graph model prevents the bias toward the reference alleles and allows maintaining high genotyping accuracy whatever the SV proximity, contrary to other state of the art genotypers. On the human gold standard HG002 dataset, SVJedi-graph obtained the best performances, genotyping 99.5% of the high confidence SV callset with an accuracy of 95% in less than 30 min. AVAILABILITY AND IMPLEMENTATION SVJedi-graph is distributed under an AGPL license and available on GitHub at https://github.com/SandraLouise/SVJedi-graph and as a BioConda package.
Collapse
Affiliation(s)
- Sandra Romain
- Univ Rennes, Inria, CNRS, IRISA, Rennes F-35000, France
| | | |
Collapse
|
24
|
Lange LM, Avenali M, Ellis M, Illarionova A, Keller Sarmiento IJ, Tan AH, Madoev H, Galandra C, Junker J, Roopnarain K, Solle J, Wegel C, Fang ZH, Heutink P, Kumar KR, Lim SY, Valente EM, Nalls M, Blauwendraat C, Singleton A, Mencacci N, Lohmann K, Klein C. Elucidating causative gene variants in hereditary Parkinson's disease in the Global Parkinson's Genetics Program (GP2). NPJ Parkinsons Dis 2023; 9:100. [PMID: 37369645 PMCID: PMC10300084 DOI: 10.1038/s41531-023-00526-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 05/15/2023] [Indexed: 06/29/2023] Open
Abstract
The Monogenic Network of the Global Parkinson's Genetics Program (GP2) aims to create an efficient infrastructure to accelerate the identification of novel genetic causes of Parkinson's disease (PD) and to improve our understanding of already identified genetic causes, such as reduced penetrance and variable clinical expressivity of known disease-causing variants. We aim to perform short- and long-read whole-genome sequencing for up to 10,000 patients with parkinsonism. Important features of this project are global involvement and focusing on historically underrepresented populations.
Collapse
Affiliation(s)
- Lara M Lange
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Micol Avenali
- IRCCS Mondino Foundation, Pavia, Italy
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Melina Ellis
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, NSW, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | | | | | - Ai-Huey Tan
- Division of Neurology, Department of Medicine, and the Mah Pooi Soo and Tan Chin Nam Centre for Parkinson's and Related Disorders, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Harutyun Madoev
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Caterina Galandra
- IRCCS Mondino Foundation, Pavia, Italy
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Johanna Junker
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | | | - Justin Solle
- Department of Clinical Research, Michael J. Fox Foundation for Parkinson's Research, New York City, NY, USA
| | - Claire Wegel
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Zih-Hua Fang
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Peter Heutink
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Kishore R Kumar
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- Molecular Medicine Laboratory and Neurology Department, Concord Repatriation General Hospital, The University of Sydney, Concord, NSW, Australia
| | - Shen-Yang Lim
- Division of Neurology, Department of Medicine, and the Mah Pooi Soo and Tan Chin Nam Centre for Parkinson's and Related Disorders, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Enza Maria Valente
- IRCCS Mondino Foundation, Pavia, Italy
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Mike Nalls
- Data Tecnica International, Washington, DC, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Integrative Genomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Andrew Singleton
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Niccolo Mencacci
- Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Katja Lohmann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Christine Klein
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
25
|
Iruegas-Bocardo F, Weisberg AJ, Riutta ER, Kilday K, Bonkowski JC, Creswell T, Daughtrey ML, Rane K, Grünwald NJ, Chang JH, Putnam ML. Whole Genome Sequencing-Based Tracing of a 2022 Introduction and Outbreak of Xanthomonas hortorum pv. pelargonii. PHYTOPATHOLOGY 2023; 113:975-984. [PMID: 36515656 DOI: 10.1094/phyto-09-22-0321-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Globalization has made agricultural commodities more accessible, available, and affordable. However, their global movement increases the potential for invasion by pathogens and necessitates development and implementation of sensitive, rapid, and scalable surveillance methods. Here, we used 35 strains, isolated by multiple diagnostic laboratories, as a case study for using whole genome sequence data in a plant disease diagnostic setting. Twenty-seven of the strains were isolated in 2022 and identified as Xanthomonas hortorum pv. pelargonii. Eighteen of these strains originated from material sold by a plant breeding company that had notified clients following a release of infected geranium cuttings. Analyses of whole genome sequences revealed epidemiological links among the 27 strains from different growers that confirmed a common source of the outbreak and uncovered likely secondary spread events within facilities that housed plants originating from different plant breeding companies. Whole genome sequencing data were also analyzed to reveal how preparatory and analytical methods can impact conclusions on outbreaks of clonal pathogenic strains. The results demonstrate the potential power of using whole genome sequencing among a network of diagnostic labs and highlight how sharing such data can help shorten response times to mitigate outbreaks more expediently and precisely than standard methods.
Collapse
Affiliation(s)
| | - Alexandra J Weisberg
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331
| | - Elizabeth R Riutta
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331
| | - Kameron Kilday
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331
| | - John C Bonkowski
- Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907
| | - Tom Creswell
- Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907
| | - Margery L Daughtrey
- Long Island Horticultural Research and Extension Center, Cornell University, Riverhead, NY 11901
| | - Karen Rane
- Department of Entomology, University of Maryland, College Park, MD 20742
| | - Niklaus J Grünwald
- Horticultural Crops Research Laboratory, U.S. Department of Agriculture-Agricultural Research Service, Corvallis, OR 97331
| | - Jeff H Chang
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331
| | - Melodie L Putnam
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331
| |
Collapse
|
26
|
Stammnitz MR, Gori K, Kwon YM, Harry E, Martin FJ, Billis K, Cheng Y, Baez-Ortega A, Chow W, Comte S, Eggertsson H, Fox S, Hamede R, Jones M, Lazenby B, Peck S, Pye R, Quail MA, Swift K, Wang J, Wood J, Howe K, Stratton MR, Ning Z, Murchison EP. The evolution of two transmissible cancers in Tasmanian devils. Science 2023; 380:283-293. [PMID: 37079675 DOI: 10.1126/science.abq6453] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
Tasmanian devils have spawned two transmissible cancer lineages, named devil facial tumor 1 (DFT1) and devil facial tumor 2 (DFT2). We investigated the genetic diversity and evolution of these clones by analyzing 78 DFT1 and 41 DFT2 genomes relative to a newly assembled, chromosome-level reference. Time-resolved phylogenetic trees reveal that DFT1 first emerged in 1986 (1982 to 1989) and DFT2 in 2011 (2009 to 2012). Subclone analysis documents transmission of heterogeneous cell populations. DFT2 has faster mutation rates than DFT1 across all variant classes, including substitutions, indels, rearrangements, transposable element insertions, and copy number alterations, and we identify a hypermutated DFT1 lineage with defective DNA mismatch repair. Several loci show plausible evidence of positive selection in DFT1 or DFT2, including loss of chromosome Y and inactivation of MGA, but none are common to both cancers. This study reveals the parallel long-term evolution of two transmissible cancers inhabiting a common niche in Tasmanian devils.
Collapse
Affiliation(s)
- Maximilian R Stammnitz
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Kevin Gori
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Young Mi Kwon
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Edward Harry
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yuanyuan Cheng
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia
| | - Adrian Baez-Ortega
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - William Chow
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sebastien Comte
- School of Natural Sciences, University of Tasmania, Hobart, TAS, Australia
- Vertebrate Pest Research Unit, NSW Department of Primary Industries, Orange, NSW, Australia
| | | | - Samantha Fox
- Save the Tasmanian Devil Program, Tasmanian Department of Natural Resources and Environment, Hobart, TAS, Australia
- Toledo Zoo, Toledo, OH, USA
| | - Rodrigo Hamede
- School of Natural Sciences, University of Tasmania, Hobart, TAS, Australia
- CANCEV, Centre de Recherches Ecologiques et Evolutives sur le Cancer, Montpellier, France
| | - Menna Jones
- School of Natural Sciences, University of Tasmania, Hobart, TAS, Australia
| | - Billie Lazenby
- Save the Tasmanian Devil Program, Tasmanian Department of Natural Resources and Environment, Hobart, TAS, Australia
| | - Sarah Peck
- Save the Tasmanian Devil Program, Tasmanian Department of Natural Resources and Environment, Hobart, TAS, Australia
| | - Ruth Pye
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia
| | - Michael A Quail
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kate Swift
- Mount Pleasant Laboratories, Tasmanian Department of Natural Resources and Environment, Prospect, TAS, Australia
| | - Jinhong Wang
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Jonathan Wood
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Michael R Stratton
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Zemin Ning
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Elizabeth P Murchison
- Transmissible Cancer Group, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
27
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
28
|
Lu TY, Smaruj PN, Fudenberg G, Mancuso N, Chaisson MJP. The motif composition of variable number tandem repeats impacts gene expression. Genome Res 2023; 33:511-524. [PMID: 37037626 PMCID: PMC10234305 DOI: 10.1101/gr.276768.122] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 03/29/2023] [Indexed: 04/12/2023]
Abstract
Understanding the impact of DNA variation on human traits is a fundamental question in human genetics. Variable number tandem repeats (VNTRs) make up ∼3% of the human genome but are often excluded from association analysis owing to poor read mappability or divergent repeat content. Although methods exist to estimate VNTR length from short-read data, it is known that VNTRs vary in both length and repeat (motif) composition. Here, we use a repeat-pangenome graph (RPGG) constructed on 35 haplotype-resolved assemblies to detect variation in both VNTR length and repeat composition. We align population-scale data from the Genotype-Tissue Expression (GTEx) Consortium to examine how variations in sequence composition may be linked to expression, including cases independent of overall VNTR length. We find that 9422 out of 39,125 VNTRs are associated with nearby gene expression through motif variations, of which only 23.4% are accessible from length. Fine-mapping identifies 174 genes to be likely driven by variation in certain VNTR motifs and not overall length. We highlight two genes, CACNA1C and RNF213, that have expression associated with motif variation, showing the utility of RPGG analysis as a new approach for trait association in multiallelic and highly variable loci.
Collapse
Affiliation(s)
- Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Paulina N Smaruj
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Geoffrey Fudenberg
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Nicholas Mancuso
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA;
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, California 90033, USA
| |
Collapse
|
29
|
Kirsche M, Prabhu G, Sherman R, Ni B, Battle A, Aganezov S, Schatz MC. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat Methods 2023; 20:408-417. [PMID: 36658279 PMCID: PMC10006329 DOI: 10.1038/s41592-022-01753-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/15/2022] [Indexed: 01/21/2023]
Abstract
The availability of long reads is revolutionizing studies of structural variants (SVs). However, because SVs vary across individuals and are discovered through imprecise read technologies and methods, they can be difficult to compare. Addressing this, we present Jasmine and Iris ( https://github.com/mkirsche/Jasmine/ ), for fast and accurate SV refinement, comparison and population analysis. Using an SV proximity graph, Jasmine outperforms six widely used comparison methods, including reducing the rate of Mendelian discordance in trio datasets by more than fivefold, and reveals a set of high-confidence de novo SVs confirmed by multiple technologies. We also present a unified callset of 122,813 SVs and 82,379 indels from 31 samples of diverse ancestry sequenced with long reads. We genotype these variants in 1,317 samples from the 1000 Genomes Project and the Genotype-Tissue Expression project with DNA and RNA-sequencing data and assess their widespread impact on gene expression, including within medically relevant genes.
Collapse
Affiliation(s)
- Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Gautam Prabhu
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel Sherman
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
30
|
Nagasaki M, Sekiya Y, Asakura A, Teraoka R, Otokozawa R, Hashimoto H, Kawaguchi T, Fukazawa K, Inadomi Y, Murata KT, Ohkawa Y, Yamaguchi I, Mizuhara T, Tokunaga K, Sekiya Y, Hanawa T, Yamada R, Matsuda F. Design and implementation of a hybrid cloud system for large-scale human genomic research. Hum Genome Var 2023; 10:6. [PMID: 36755016 PMCID: PMC9908893 DOI: 10.1038/s41439-023-00231-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 02/10/2023] Open
Abstract
In the field of genomic medical research, the amount of large-scale information continues to increase due to advances in measurement technologies, such as high-performance sequencing and spatial omics, as well as the progress made in genomic cohort studies involving more than one million individuals. Therefore, researchers require more computational resources to analyze this information. Here, we introduce a hybrid cloud system consisting of an on-premise supercomputer, science cloud, and public cloud at the Kyoto University Center for Genomic Medicine in Japan as a solution. This system can flexibly handle various heterogeneous computational resource-demanding bioinformatics tools while scaling the computational capacity. In the hybrid cloud system, we demonstrate the way to properly perform joint genotyping of whole-genome sequencing data for a large population of 11,238, which can be a bottleneck in sequencing data analysis. This system can be one of the reference implementations when dealing with large amounts of genomic medical data in research centers and organizations.
Collapse
Affiliation(s)
- Masao Nagasaki
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan.
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| | - Yayoi Sekiya
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Akihiro Asakura
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Ryo Teraoka
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Ryoko Otokozawa
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Hiroki Hashimoto
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Takahisa Kawaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Keiichiro Fukazawa
- Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan
| | - Yuichi Inadomi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Ken T Murata
- ICT Testbed Research and Development Promotion Center National Institute of Information and Communications Technology (NICT), Tokyo, Japan
| | - Yasuyuki Ohkawa
- Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Izumi Yamaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | | | - Katsushi Tokunaga
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yuji Sekiya
- Information Technology Center, The University of Tokyo, Chiba, Japan
| | - Toshihiro Hanawa
- Information Technology Center, The University of Tokyo, Chiba, Japan
| | - Ryo Yamada
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Fumihiko Matsuda
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
31
|
Lundberg M, Mackintosh A, Petri A, Bensch S. Inversions maintain differences between migratory phenotypes of a songbird. Nat Commun 2023; 14:452. [PMID: 36707538 PMCID: PMC9883250 DOI: 10.1038/s41467-023-36167-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 01/18/2023] [Indexed: 01/28/2023] Open
Abstract
Structural rearrangements have been shown to be important in local adaptation and speciation, but have been difficult to reliably identify and characterize in non-model species. Here we combine long reads, linked reads and optical mapping to characterize three divergent chromosome regions in the willow warbler Phylloscopus trochilus, of which two are associated with differences in migration and one with an environmental gradient. We show that there are inversions (0.4-13 Mb) in each of the regions and that the divergence times between inverted and non-inverted haplotypes are similar across the regions (~1.2 Myrs), which is compatible with a scenario where inversions arose in either of two allopatric populations that subsequently hybridized. The improved genomes allow us to detect additional functional differences in the divergent regions, providing candidate genes for migration and adaptations to environmental gradients.
Collapse
Affiliation(s)
- Max Lundberg
- Department of Biology, Lund University, Lund, Sweden.
| | | | - Anna Petri
- Science for Life Laboratory, Uppsala Genome Center, Uppsala University, Uppsala, Sweden
| | | |
Collapse
|
32
|
Wheeler MM, Stilp AM, Rao S, Halldórsson BV, Beyter D, Wen J, Mihkaylova AV, McHugh CP, Lane J, Jiang MZ, Raffield LM, Jun G, Sedlazeck FJ, Metcalf G, Yao Y, Bis JB, Chami N, de Vries PS, Desai P, Floyd JS, Gao Y, Kammers K, Kim W, Moon JY, Ratan A, Yanek LR, Almasy L, Becker LC, Blangero J, Cho MH, Curran JE, Fornage M, Kaplan RC, Lewis JP, Loos RJF, Mitchell BD, Morrison AC, Preuss M, Psaty BM, Rich SS, Rotter JI, Tang H, Tracy RP, Boerwinkle E, Abecasis GR, Blackwell TW, Smith AV, Johnson AD, Mathias RA, Nickerson DA, Conomos MP, Li Y, Þorsteinsdóttir U, Magnússon MK, Stefansson K, Pankratz ND, Bauer DE, Auer PL, Reiner AP. Whole genome sequencing identifies structural variants contributing to hematologic traits in the NHLBI TOPMed program. Nat Commun 2022; 13:7592. [PMID: 36481753 PMCID: PMC9732337 DOI: 10.1038/s41467-022-35354-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 11/29/2022] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies have identified thousands of single nucleotide variants and small indels that contribute to variation in hematologic traits. While structural variants are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of structural variants to quantitative blood cell trait variation is unknown. Here we utilized whole genome sequencing data in ancestrally diverse participants of the NHLBI Trans Omics for Precision Medicine program (N = 50,675) to detect structural variants associated with hematologic traits. Using single variant tests, we assessed the association of common and rare structural variants with red cell-, white cell-, and platelet-related quantitative traits and observed 21 independent signals (12 common and 9 rare) reaching genome-wide significance. The majority of these associations (N = 18) replicated in independent datasets. In genome-editing experiments, we provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.
Collapse
Affiliation(s)
- Marsha M Wheeler
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Adrienne M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Shuquan Rao
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Harvard Stem Cell Institute, Boston, MA, 02138, USA
- Broad Institute, Cambridge, MA, 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Jia Wen
- Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Anna V Mihkaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Caitlin P McHugh
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - John Lane
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, 55455, USA
| | - Min-Zhi Jiang
- Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Goo Jun
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ginger Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yao Yao
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Harvard Stem Cell Institute, Boston, MA, 02138, USA
- Broad Institute, Cambridge, MA, 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
| | - Joshua B Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, 98101, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Paul S de Vries
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Pinkal Desai
- Division of Hematology and Oncology, Weill Cornell Medical College, New York, NY, 10065, USA
| | - James S Floyd
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, 98101, USA
| | - Yan Gao
- Jackson Heart Study, Department of Medicine, University of Mississippi, Jackson, MS, 39216, USA
| | - Kai Kammers
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, 2115, USA
| | - Jee-Young Moon
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Laura Almasy
- Children's Hospital of Philadelphia and University of Pennsylvania School of Medicine, Philadelphia, PA, 19104, USA
| | - Lewis C Becker
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, 78520, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, 2115, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, 78520, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Joshua P Lewis
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Braxton D Mitchell
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Michael Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, 98101, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Russell P Tracy
- Departments of Pathology & Laboratory Medicine and Biochemistry, Larner College of Medicine at the University of Vermont, Colchester, VT, 5446, USA
| | - Eric Boerwinkle
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Goncalo R Abecasis
- TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Thomas W Blackwell
- TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Albert V Smith
- TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Andrew D Johnson
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Framingham, MA, 1702, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Yun Li
- Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Unnur Þorsteinsdóttir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101, Reykjavik, Iceland
| | - Magnús K Magnússon
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101, Reykjavik, Iceland
| | - Nathan D Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, 55455, USA
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Harvard Stem Cell Institute, Boston, MA, 02138, USA
- Broad Institute, Cambridge, MA, 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA.
| | - Alex P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98105, USA.
| |
Collapse
|
33
|
Singh V, Pandey S, Bhardwaj A. From the reference human genome to human pangenome: Premise, promise and challenge. Front Genet 2022; 13:1042550. [PMID: 36437921 PMCID: PMC9684177 DOI: 10.3389/fgene.2022.1042550] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 10/21/2022] [Indexed: 11/11/2022] Open
Abstract
The Reference Human Genome remains the single most important resource for mapping genetic variations and assessing their impact. However, it is monophasic, incomplete and not representative of the variation that exists in the population. Given the extent of ethno-geographic diversity and the consequent diversity in clinical manifestations of these variations, population specific references were developed overtime. The dramatically plummeting cost of sequencing whole genomes and the advent of third generation long range sequencers allowing accurate, error free, telomere-to-telomere assemblies of human genomes present us with a unique and unprecedented opportunity to develop a more composite standard reference consisting of a collection of multiple genomes that capture the maximal variation existing in the population, with the deepest annotation possible, enabling a realistic, reliable and actionable estimation of clinical significance of specific variations. The Human Pangenome Project thus is a logical next step promising a more accurate and global representation of genomic variations. The pangenome effort must be reciprocally complemented with precise variant discovery tools and exhaustive annotation to ensure unambiguous clinical assessment of the variant in ethno-geographical context. Here we discuss a broad roadmap, the challenges and way forward in developing a universal pangenome reference including data visualization techniques and integration of prior knowledge base in the new graph based architecture and tools to submit, compare, query, annotate and retrieve relevant information from the pangenomes. The biggest challenge, however, will be the ethical, legal and social implications and the training of human resource to the new reference paradigm.
Collapse
Affiliation(s)
- Vipin Singh
- University Institute of Biotechnology, Chandigarh University, Mohali, India
| | - Shweta Pandey
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Anshu Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
- *Correspondence: Anshu Bhardwaj,
| |
Collapse
|
34
|
Chan SH, Bylstra Y, Teo JX, Kuan JL, Bertin N, Gonzalez-Porta M, Hebrard M, Tirado-Magallanes R, Tan JHJ, Jeyakani J, Li Z, Chai JF, Chong YS, Davila S, Goh LL, Lee ES, Wong E, Wong TY, Prabhakar S, Liu J, Cheng CY, Eisenhaber B, Karnani N, Leong KP, Sim X, Yeo KK, Chambers JC, Tai ES, Tan P, Jamuar SS, Ngeow J, Lim WK, Gluckman PD, Goh DLM, Jain K, Kam S, Kassam I, Lakshmanan LN, Lee CG, Lee J, Lee SC, Lee YS, Li H, Lim CW, Lim TH, Loh M, Maurer-Stroh S, Mina TH, Mok SQ, Ng HK, Pua CJ, Riboli E, Rim TH, Sabanayagam C, Sim WC, Subramaniam T, Tan ES, Tan EK, Tantoso E, Tay D, Teo YY, Tham YC, Toh LXG, Tsai PK, van Dam RM, Veeravalli L, Khin-lin GW, Wilm A, Yang C, Yap F, Yew YW, Prabhakar S, Liu J, Cheng CY, Eisenhaber B, Karnani N, Leong KP, Sim X, Yeo KK, Chambers JC, Tai ES, Tan P, Jamuar SS, Ngeow J, Lim WK. Analysis of clinically relevant variants from ancestrally diverse Asian genomes. Nat Commun 2022; 13:6694. [PMID: 36335097 PMCID: PMC9637116 DOI: 10.1038/s41467-022-34116-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 10/12/2022] [Indexed: 11/06/2022] Open
Abstract
Asian populations are under-represented in human genomics research. Here, we characterize clinically significant genetic variation in 9051 genomes representing East Asian, South Asian, and severely under-represented Austronesian-speaking Southeast Asian ancestries. We observe disparate genetic risk burden attributable to ancestry-specific recurrent variants and identify individuals with variants specific to ancestries discordant to their self-reported ethnicity, mostly due to cryptic admixture. About 27% of severe recessive disorder genes with appreciable carrier frequencies in Asians are missed by carrier screening panels, and we estimate 0.5% Asian couples at-risk of having an affected child. Prevalence of medically-actionable variant carriers is 3.4% and a further 1.6% harbour variants with potential for pathogenic classification upon additional clinical/experimental evidence. We profile 23 pharmacogenes with high-confidence gene-drug associations and find 22.4% of Asians at-risk of Centers for Disease Control and Prevention Tier 1 genetic conditions concurrently harbour pharmacogenetic variants with actionable phenotypes, highlighting the benefits of pre-emptive pharmacogenomics. Our findings illuminate the diversity in genetic disease epidemiology and opportunities for precision medicine for a large, diverse Asian population.
Collapse
Affiliation(s)
- Sock Hoai Chan
- grid.410724.40000 0004 0620 9745Cancer Genetics Service, Division of Medical Oncology, National Cancer Centre Singapore, Singapore, 169610 Singapore ,grid.428397.30000 0004 0385 0924Oncology Academic Clinical Program, Duke-NUS Medical School, Singapore, 169857 Singapore ,grid.59025.3b0000 0001 2224 0361Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232 Singapore
| | - Yasmin Bylstra
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore
| | - Jing Xian Teo
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore
| | - Jyn Ling Kuan
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore
| | - Nicolas Bertin
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Mar Gonzalez-Porta
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Maxime Hebrard
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Roberto Tirado-Magallanes
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Joanna Hui Juan Tan
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Justin Jeyakani
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Zhihui Li
- grid.418377.e0000 0004 0620 715XGenome Research Informatics & Data Science Platform, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Jin Fang Chai
- grid.4280.e0000 0001 2180 6431Saw Swee Hock School of Public Health, National University of Singapore, Singapore, 117549 Singapore
| | - Yap Seng Chong
- grid.4280.e0000 0001 2180 6431Department of Obstetrics & Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore ,grid.452264.30000 0004 0530 269XSingapore Institute for Clinical Sciences, Singapore, 117609 Singapore
| | - Sonia Davila
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore ,grid.428397.30000 0004 0385 0924Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical School, Singapore, 169857 Singapore ,grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Genomic Medicine Centre, Singapore, 168582 Singapore
| | - Liuh Ling Goh
- grid.240988.f0000 0001 0298 8161Personalized Medicine Service, Tan Tock Seng Hospital, Singapore, 308433 Singapore
| | - Eng Sing Lee
- grid.59025.3b0000 0001 2224 0361Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232 Singapore ,grid.466910.c0000 0004 0451 6215National Healthcare Group Polyclinics, Singapore, 138543 Singapore
| | - Eleanor Wong
- grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Tien Yin Wong
- grid.419272.b0000 0000 9960 1711Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, 168751 Singapore
| | | | - Shyam Prabhakar
- grid.418377.e0000 0004 0620 715XLaboratory of Systems Biology and Data Analytics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore
| | - Jianjun Liu
- grid.418377.e0000 0004 0620 715XHuman Genomics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore ,grid.4280.e0000 0001 2180 6431Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore
| | - Ching-Yu Cheng
- grid.419272.b0000 0000 9960 1711Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, 168751 Singapore ,grid.428397.30000 0004 0385 0924Ophthalmology & Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore, 169857 Singapore
| | - Birgit Eisenhaber
- grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore ,grid.418325.90000 0000 9351 8132Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671 Singapore
| | - Neerja Karnani
- grid.452264.30000 0004 0530 269XHuman Development, Singapore Institute for Clinical Sciences, Singapore, 117609 Singapore ,grid.418325.90000 0000 9351 8132Clinical Data Engagement, Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, 138671 Singapore ,grid.4280.e0000 0001 2180 6431Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596 Singapore
| | - Khai Pang Leong
- grid.240988.f0000 0001 0298 8161Personalized Medicine Service, Tan Tock Seng Hospital, Singapore, 308433 Singapore ,grid.240988.f0000 0001 0298 8161Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore, 308433 Singapore
| | - Xueling Sim
- grid.4280.e0000 0001 2180 6431Saw Swee Hock School of Public Health, National University of Singapore, Singapore, 117549 Singapore
| | - Khung Keong Yeo
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore ,grid.419385.20000 0004 0620 9905Department of Cardiology, National Heart Centre Singapore, Singapore, 169609 Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, 169857 Singapore
| | - John C. Chambers
- grid.59025.3b0000 0001 2224 0361Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232 Singapore ,Precision Health Research Singapore (PRECISE), Singapore, 139234 Singapore ,grid.7445.20000 0001 2113 8111Department of Epidemiology and Biostatistics, Imperial College London, London, W2 1PG UK
| | - E-Shyong Tai
- grid.4280.e0000 0001 2180 6431Saw Swee Hock School of Public Health, National University of Singapore, Singapore, 117549 Singapore ,grid.4280.e0000 0001 2180 6431Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, 169857 Singapore ,Precision Health Research Singapore (PRECISE), Singapore, 139234 Singapore
| | - Patrick Tan
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore ,grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Agency for Science, Technology and Research, Singapore, 138672 Singapore ,Precision Health Research Singapore (PRECISE), Singapore, 139234 Singapore ,grid.428397.30000 0004 0385 0924Cancer & Stem Cell Biology Program, Duke-NUS Medical School, Singapore, 169857 Singapore ,grid.4280.e0000 0001 2180 6431Cancer Science Institute of Singapore, National University of Singapore, Singapore, 117599 Singapore
| | - Saumya S. Jamuar
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore ,grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Genomic Medicine Centre, Singapore, 168582 Singapore ,grid.414963.d0000 0000 8958 3388Genetics Service, Department of Paediatrics, KK Women’s and Children’s Hospital, Singapore, 229899 Singapore ,grid.428397.30000 0004 0385 0924Paediatric Academic Clinical Program, Duke-NUS Medical School, Singapore, 169857 Singapore
| | - Joanne Ngeow
- grid.410724.40000 0004 0620 9745Cancer Genetics Service, Division of Medical Oncology, National Cancer Centre Singapore, Singapore, 169610 Singapore ,grid.428397.30000 0004 0385 0924Oncology Academic Clinical Program, Duke-NUS Medical School, Singapore, 169857 Singapore ,grid.59025.3b0000 0001 2224 0361Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232 Singapore ,grid.185448.40000 0004 0637 0221Institute of Molecular and Cellular Biology, Agency for Science, Technology and Research, Singapore, 138673 Singapore
| | - Weng Khong Lim
- grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Institute of Precision Medicine, Singapore, 169609 Singapore ,grid.4280.e0000 0001 2180 6431SingHealth Duke-NUS Genomic Medicine Centre, Singapore, 168582 Singapore ,grid.428397.30000 0004 0385 0924Cancer & Stem Cell Biology Program, Duke-NUS Medical School, Singapore, 169857 Singapore
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO, Gee HY, Oh J, Jang IJ, Lee S, Baek D, Koh Y, Yoon SS, Kim YJ, Chae JH, Park WY, Bhak JH, Choi M. A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East Asian population. Exp Mol Med 2022; 54:1862-1871. [PMID: 36323850 PMCID: PMC9628380 DOI: 10.1038/s12276-022-00871-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/21/2022] [Accepted: 08/08/2022] [Indexed: 11/29/2022] Open
Abstract
Despite substantial advances in disease genetics, studies to date have largely focused on individuals of European descent. This limits further discoveries of novel functional genetic variants in other ethnic groups. To alleviate the paucity of East Asian population genome resources, we established the Korean Variant Archive 2 (KOVA 2), which is composed of 1896 whole-genome sequences and 3409 whole-exome sequences from healthy individuals of Korean ethnicity. This is the largest genome database from the ethnic Korean population to date, surpassing the 1909 Korean individuals deposited in gnomAD. The variants in KOVA 2 displayed all the known genetic features of those from previous genome databases, and we compiled data from Korean-specific runs of homozygosity, positively selected intervals, and structural variants. In doing so, we found loci, such as the loci of ADH1A/1B and UHRF1BP1, that are strongly selected in the Korean population relative to other East Asian populations. Our analysis of allele ages revealed a correlation between variant functionality and evolutionary age. The data can be browsed and downloaded from a public website ( https://www.kobic.re.kr/kova/ ). We anticipate that KOVA 2 will serve as a valuable resource for genetic studies involving East Asian populations.
Collapse
Affiliation(s)
- Jeongeun Lee
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, 03080 Republic of Korea
| | - Jean Lee
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Sungwon Jeon
- grid.42687.3f0000 0004 0381 814XDepartment of Biomedical Engineering, College of Information and Biotechnology, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919 Republic of Korea
| | - Jeongha Lee
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Insu Jang
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea
| | - Jin Ok Yang
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea ,grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| | - Soojin Park
- grid.31501.360000 0004 0470 5905Department of Pediatrics, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Byungwook Lee
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea
| | - Jinwook Choi
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, 03080 Republic of Korea ,grid.31501.360000 0004 0470 5905Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Byung-Ok Choi
- grid.264381.a0000 0001 2181 989XDepartment of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351 Republic of Korea
| | - Heon Yung Gee
- grid.15444.300000 0004 0470 5454Department of Pharmacology, Brain Korea 21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, Seoul, 03722 Republic of Korea
| | - Jaeseong Oh
- grid.31501.360000 0004 0470 5905Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, 03080 Republic of Korea
| | - In-Jin Jang
- grid.31501.360000 0004 0470 5905Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, 03080 Republic of Korea
| | - Sanghyuk Lee
- grid.255649.90000 0001 2171 7754Department of Bio-Information Science, Ewha Womans University, Seoul, 03760 Republic of Korea
| | - Daehyun Baek
- grid.31501.360000 0004 0470 5905School of Biological Sciences, Seoul National University, Seoul, 08826 Republic of Korea
| | - Youngil Koh
- grid.412484.f0000 0001 0302 820XDepartment of Internal Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Sung-Soo Yoon
- grid.412484.f0000 0001 0302 820XDepartment of Internal Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Young-Joon Kim
- grid.15444.300000 0004 0470 5454Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, 03722 Republic of Korea
| | - Jong-Hee Chae
- grid.31501.360000 0004 0470 5905Department of Pediatrics, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea ,grid.412484.f0000 0001 0302 820XDepartment of Genomic Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Woong-Yang Park
- grid.414964.a0000 0001 0640 5613Samsung Genome Institute, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Jong Hwa Bhak
- grid.42687.3f0000 0004 0381 814XDepartment of Biomedical Engineering, College of Information and Biotechnology, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919 Republic of Korea
| | - Murim Choi
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| |
Collapse
|
36
|
Tetikol HS, Turgut D, Narci K, Budak G, Kalay O, Arslan E, Demirkaya-Budak S, Dolgoborodov A, Kabakci-Zorlu D, Semenyuk V, Jain A, Davis-Dusenbery BN. Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis. Nat Commun 2022; 13:4384. [PMID: 35927245 PMCID: PMC9352875 DOI: 10.1038/s41467-022-31724-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 06/30/2022] [Indexed: 11/29/2022] Open
Abstract
Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.
Collapse
Affiliation(s)
| | | | - Kubra Narci
- Seven Bridges Genomics, Charlestown, MA, USA
| | | | - Ozem Kalay
- Seven Bridges Genomics, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Charlestown, MA, USA
| | | |
Collapse
|
37
|
Deaton AM, Dubey A, Ward LD, Dornbos P, Flannick J, Yee E, Ticau S, Noetzli L, Parker MM, Hoffing RA, Willis C, Plekan ME, Holleman AM, Hinkle G, Fitzgerald K, Vaishnaw AK, Nioi P. Rare loss of function variants in the hepatokine gene INHBE protect from abdominal obesity. Nat Commun 2022; 13:4319. [PMID: 35896531 PMCID: PMC9329324 DOI: 10.1038/s41467-022-31757-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 07/01/2022] [Indexed: 02/07/2023] Open
Abstract
Identifying genetic variants associated with lower waist-to-hip ratio can reveal new therapeutic targets for abdominal obesity. We use exome sequences from 362,679 individuals to identify genes associated with waist-to-hip ratio adjusted for BMI (WHRadjBMI), a surrogate for abdominal fat that is causally linked to type 2 diabetes and coronary heart disease. Predicted loss of function (pLOF) variants in INHBE associate with lower WHRadjBMI and this association replicates in data from AMP-T2D-GENES. INHBE encodes a secreted protein, the hepatokine activin E. In vitro characterization of the most common INHBE pLOF variant in our study, indicates an in-frame deletion resulting in a 90% reduction in secreted protein levels. We detect associations with lower WHRadjBMI for variants in ACVR1C, encoding an activin receptor, further highlighting the involvement of activins in regulating fat distribution. These findings highlight activin E as a potential therapeutic target for abdominal obesity, a phenotype linked to cardiometabolic disease.
Collapse
Affiliation(s)
| | | | | | - Peter Dornbos
- Programs in Metabolism and Medical & Population Genetics, Broad Institute, Cambridge, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Jason Flannick
- Programs in Metabolism and Medical & Population Genetics, Broad Institute, Cambridge, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.,Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | | | - Elaine Yee
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | | | | - Paul Nioi
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| |
Collapse
|
38
|
Hunt M, Letcher B, Malone KM, Nguyen G, Hall MB, Colquhoun RM, Lima L, Schatz MC, Ramakrishnan S, Iqbal Z. Minos: variant adjudication and joint genotyping of cohorts of bacterial genomes. Genome Biol 2022; 23:147. [PMID: 35791022 PMCID: PMC9254434 DOI: 10.1186/s13059-022-02714-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 06/20/2022] [Indexed: 12/30/2022] Open
Abstract
There are many short-read variant-calling tools, with different strengths and weaknesses. We present a tool, Minos, which combines outputs from arbitrary variant callers, increasing recall without loss of precision. We benchmark on 62 samples from three bacterial species and an outbreak of 385 Mycobacterium tuberculosis samples. Minos also enables joint genotyping; we demonstrate on a large (N=13k) M. tuberculosis cohort, building a map of non-synonymous SNPs and indels in a region where all such variants are assumed to cause rifampicin resistance. We quantify the correlation with phenotypic resistance and then replicate in a second cohort (N=10k).
Collapse
Affiliation(s)
- Martin Hunt
- EMBL-EBI, Cambridge, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | | | | | | | | | - Rachel M Colquhoun
- Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh, Edinburgh, UK
| | | | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | | |
Collapse
|
39
|
Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O, Ulfarsson MO, Palsson G, Hardarson MT, Oddsson A, Jensson BO, Kristmundsdottir S, Sigurpalsdottir BD, Stefansson OA, Beyter D, Holley G, Tragante V, Gylfason A, Olason PI, Zink F, Asgeirsdottir M, Sverrisson ST, Sigurdsson B, Gudjonsson SA, Sigurdsson GT, Halldorsson GH, Sveinbjornsson G, Norland K, Styrkarsdottir U, Magnusdottir DN, Snorradottir S, Kristinsson K, Sobech E, Jonsson H, Geirsson AJ, Olafsson I, Jonsson P, Pedersen OB, Erikstrup C, Brunak S, Ostrowski SR, Thorleifsson G, Jonsson F, Melsted P, Jonsdottir I, Rafnar T, Holm H, Stefansson H, Saemundsdottir J, Gudbjartsson DF, Magnusson OT, Masson G, Thorsteinsdottir U, Helgason A, Jonsson H, Sulem P, Stefansson K. The sequences of 150,119 genomes in the UK Biobank. Nature 2022; 607:732-740. [PMID: 35859178 PMCID: PMC9329122 DOI: 10.1038/s41586-022-04965-x] [Citation(s) in RCA: 150] [Impact Index Per Article: 75.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/10/2022] [Indexed: 12/25/2022]
Abstract
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data1,2. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank3. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Collapse
Affiliation(s)
- Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland. .,School of Technology, Reykjavik University, Reykjavik, Iceland.
| | | | | | | | | | - Magnus O Ulfarsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Marteinn T Hardarson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | - Snaedis Kristmundsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | - Brynja D Sigurpalsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Technology, Reykjavik University, Reykjavik, Iceland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Helgi Jonsson
- Landspitali-University Hospital, Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Palmi Jonsson
- Landspitali-University Hospital, Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Ole Birger Pedersen
- Department of Clinical Immunology, Zealand University Hospital, Køge, Denmark
| | - Christian Erikstrup
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.,Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Sisse Rye Ostrowski
- Department of Clinical Immunology, Copenhagen University Hospital (Rigshospitalet), Copenhagen, Denmark.,Department of Clinical Medicine, Faculty of Health and Clinical Sciences, Copenhagen University, Copenhagen, Denmark
| | | | | | | | - Pall Melsted
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Ingileif Jonsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | | | - Hilma Holm
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | | | | | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | | | | | - Unnur Thorsteinsdottir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Agnar Helgason
- deCODE genetics/Amgen Inc., Reykjavik, Iceland.,Department of Anthropology, University of Iceland, Reykjavik, Iceland
| | | | | | | |
Collapse
|
40
|
Quan C, Lu H, Lu Y, Zhou G. Population-scale genotyping of structural variation in the era of long-read sequencing. Comput Struct Biotechnol J 2022; 20:2639-2647. [PMID: 35685364 PMCID: PMC9163579 DOI: 10.1016/j.csbj.2022.05.047] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 05/24/2022] [Indexed: 11/29/2022] Open
Abstract
Population-scale studies of structural variation (SV) are growing rapidly worldwide with the development of long-read sequencing technology, yielding a considerable number of novel SVs and complete gap-closed genome assemblies. Herein, we highlight recent studies using a hybrid sequencing strategy and present the challenges toward large-scale genotyping for SVs due to the reference bias. Genotyping SVs at a population scale remains challenging, which severely impacts genotype-based population genetic studies or genome-wide association studies of complex diseases. We summarize academic efforts to improve genotype quality through linear or graph representations of reference and alternative alleles. Graph-based genotypers capable of integrating diverse genetic information are effectively applied to large and diverse cohorts, contributing to unbiased downstream analysis. Meanwhile, there is still an urgent need in this field for efficient tools to construct complex graphs and perform sequence-to-graph alignments.
Collapse
Affiliation(s)
- Cheng Quan
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Hao Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
| | - Yiming Lu
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
- Corresponding authors at: Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, PR China (G. Zhou). Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850, PR China (Y. Lu).
| | - Gangqiao Zhou
- Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 100850, PR China
- Collaborative Innovation Center for Personalized Cancer Medicine, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu Province 211166, PR China
- Medical College of Guizhou University, Guiyang, Guizhou Province 550025, PR China
- Hebei University, Baoding, Hebei Province 071002, PR China
- Corresponding authors at: Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, PR China (G. Zhou). Department of Genetics & Integrative Omics, State Key Laboratory of Proteomics, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing, 100850, PR China (Y. Lu).
| |
Collapse
|
41
|
Duan X, Pan M, Fan S. Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data. BMC Genomics 2022; 23:324. [PMID: 35461238 PMCID: PMC9034514 DOI: 10.1186/s12864-022-08548-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/11/2022] [Indexed: 12/28/2022] Open
Abstract
Background Structural variants (SVs) play a crucial role in gene regulation, trait association, and disease in humans. SV genotyping has been extensively applied in genomics research and clinical diagnosis. Although a growing number of SV genotyping methods for long reads have been developed, a comprehensive performance assessment of these methods has yet to be done. Results Based on one simulated and three real SV datasets, we performed an in-depth evaluation of five SV genotyping methods, including cuteSV, LRcaller, Sniffles, SVJedi, and VaPoR. The results show that for insertions and deletions, cuteSV and LRcaller have similar F1 scores (cuteSV, insertions: 0.69–0.90, deletions: 0.77–0.90 and LRcaller, insertions: 0.67–0.87, deletions: 0.74–0.91) and are superior to other methods. For duplications, inversions, and translocations, LRcaller yields the most accurate genotyping results (0.84, 0.68, and 0.47, respectively). When genotyping SVs located in tandem repeat region or with imprecise breakpoints, cuteSV (insertions and deletions) and LRcaller (duplications, inversions, and translocations) are better than other methods. In addition, we observed a decrease in F1 scores when the SV size increased. Finally, our analyses suggest that the F1 scores of these methods reach the point of diminishing returns at 20× depth of coverage. Conclusions We present an in-depth benchmark study of long-read SV genotyping methods. Our results highlight the advantages and disadvantages of each genotyping method, which provide practical guidance for optimal application selection and prospective directions for tool improvement. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08548-y.
Collapse
Affiliation(s)
- Xiaoke Duan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai, 200438, China.,MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200433, China
| | - Mingpei Pan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai, 200438, China.,MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200433, China
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
42
|
The Human Pangenome Project: a global resource to map genomic diversity. Nature 2022; 604:437-446. [PMID: 35444317 DOI: 10.1038/s41586-022-04601-8] [Citation(s) in RCA: 148] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 03/01/2022] [Indexed: 12/20/2022]
Abstract
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.
Collapse
|
43
|
Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, Mao Y, Korbel JO, Eichler EE, Zody MC, Dilthey AT, Marschall T. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet 2022; 54:518-525. [PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/03/2022] [Indexed: 12/30/2022]
Abstract
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
Collapse
Affiliation(s)
- Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, GeneCore, Heidelberg, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
44
|
Zanini SF, Bayer PE, Wells R, Snowdon RJ, Batley J, Varshney RK, Nguyen HT, Edwards D, Golicz AA. Pangenomics in crop improvement-from coding structural variations to finding regulatory variants with pangenome graphs. THE PLANT GENOME 2022; 15:e20177. [PMID: 34904403 DOI: 10.1002/tpg2.20177] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 10/07/2021] [Indexed: 05/15/2023]
Abstract
Since the first reported crop pangenome in 2014, advances in high-throughput and cost-effective DNA sequencing technologies facilitated multiple such studies including the pangenomes of oilseed rape (Brassica napus L.), soybean [Glycine max (L.) Merr.], rice (Oryza sativa L.), wheat (Triticum aestivum L.), and barley (Hordeum vulgare L.). Compared with single-reference genomes, pangenomes provide a more accurate representation of the genetic variation present in a species. By combining the genomic data of multiple accessions, pangenomes allow for the detection and annotation of complex DNA polymorphisms such as structural variations (SVs), one of the major determinants of genetic diversity within a species. In this review we summarize the current literature on crop pangenomics, focusing on their application to find candidate SVs involved in traits of agronomic interest. We then highlight the potential of pangenomes in the discovery and functional characterization of noncoding regulatory sequences and their variations. We conclude with a summary and outlook on innovative data structures representing the complete content of plant pangenomes including annotations of coding and noncoding elements and outcomes of transcriptomic and epigenomic experiments.
Collapse
Affiliation(s)
- Silvia F Zanini
- Dep. of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig Univ. Giessen, Giessen, 35392, Germany
| | - Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, Univ. of Western Australia, Perth, Western Australia, Australia
| | - Rachel Wells
- Dep. of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, NR47UH, UK
| | - Rod J Snowdon
- Dep. of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig Univ. Giessen, Giessen, 35392, Germany
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, Univ. of Western Australia, Perth, Western Australia, Australia
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India
- State Agricultural Biotechnology Centre, Centre for Crop Food Innovation, Food Futures Institute, Murdoch Univ., Murdoch, WA, Australia
| | - Henry T Nguyen
- Division of Plant Sciences, Univ. of Missouri, Columbia, MO, USA
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, Univ. of Western Australia, Perth, Western Australia, Australia
| | - Agnieszka A Golicz
- Dep. of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig Univ. Giessen, Giessen, 35392, Germany
| |
Collapse
|
45
|
Hansen CCR, Westfall KM, Pálsson S. Evaluation of four methods to identify the homozygotic sex chromosome in small populations. BMC Genomics 2022; 23:160. [PMID: 35209843 PMCID: PMC8867824 DOI: 10.1186/s12864-022-08393-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whole genomes are commonly assembled into a collection of scaffolds and often lack annotations of autosomes, sex chromosomes, and organelle genomes (i.e., mitochondrial and chloroplast). As these chromosome types differ in effective population size and can have highly disparate evolutionary histories, it is imperative to take this information into account when analysing genomic variation. Here we assessed the accuracy of four methods for identifying the homogametic sex chromosome in a small population using two whole genome sequences (WGS) and 133 RAD sequences of white-tailed eagles (Haliaeetus albicilla): i) difference in read depth per scaffold in a male and a female, ii) heterozygosity per scaffold in a male and a female, iii) mapping to the reference genome of a related species (chicken) with annotated sex chromosomes, and iv) analysis of SNP-loadings from a principal components analysis (PCA), based on the low-depth RADseq data. RESULTS The best performing approach was the reference mapping (method iii), which identified 98.12% of the expected homogametic sex chromosome (Z). Read depth per scaffold (method i) identified 86.41% of the homogametic sex chromosome with few false positives. SNP-loading scores (method iv) identified 78.6% of the Z-chromosome and had a false positive discovery rate of more than 10%. Heterozygosity per scaffold (method ii) did not provide clear results due to a lack of diversity in both the Z and autosomal chromosomes, and potential interference from the heterogametic sex chromosome (W). The evaluation of these methods also revealed 10 Mb of putative PAR and gametologous regions. CONCLUSION Identification of the homogametic sex chromosome in a small population is best accomplished by reference mapping or examining differences in read depth between sexes.
Collapse
Affiliation(s)
| | - Kristen M Westfall
- Department of Life and Environmental Sciences, University of Iceland, Reykjavik, Iceland.,Current: Fisheries and Oceans Canada, Pacific Biological Station, Nanaimo, BC, Canada
| | - Snæbjörn Pálsson
- Department of Life and Environmental Sciences, University of Iceland, Reykjavik, Iceland
| |
Collapse
|
46
|
Zhang C, Hansen MEB, Tishkoff SA. Advances in integrative African genomics. Trends Genet 2022; 38:152-168. [PMID: 34740451 PMCID: PMC8752515 DOI: 10.1016/j.tig.2021.09.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 12/16/2022]
Abstract
There has been a rapid increase in human genome sequencing in the past two decades, resulting in the identification of millions of previously unknown genetic variants. However, African populations are under-represented in sequencing efforts. Additional sequencing from diverse African populations and the construction of African-specific reference genomes is needed to better characterize the full spectrum of variation in humans. However, sequencing alone is insufficient to address the molecular and cellular mechanisms underlying variable phenotypes and disease risks. Determining functional consequences of genetic variation using multi-omics approaches is a fundamental post-genomic challenge. We discuss approaches to close the knowledge gaps about African genomic diversity and review advances in African integrative genomic studies and their implications for precision medicine.
Collapse
Affiliation(s)
- Chao Zhang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew E B Hansen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sarah A Tishkoff
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
47
|
Bergeron LA, Besenbacher S, Turner T, Versoza CJ, Wang RJ, Price AL, Armstrong E, Riera M, Carlson J, Chen HY, Hahn MW, Harris K, Kleppe AS, López-Nandam EH, Moorjani P, Pfeifer SP, Tiley GP, Yoder AD, Zhang G, Schierup MH. The mutationathon highlights the importance of reaching standardization in estimates of pedigree-based germline mutation rates. eLife 2022; 11:73577. [PMID: 35018888 PMCID: PMC8830884 DOI: 10.7554/elife.73577] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 01/11/2022] [Indexed: 11/13/2022] Open
Abstract
In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various nonhuman species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and appropriately accounting for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a ‘Mutationathon,’ a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a twofold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria, and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees.
Collapse
Affiliation(s)
- Lucie A Bergeron
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Søren Besenbacher
- Department of Molecular Medicine (MOMA), Aarhus University, Aarhus N, Denmark
| | - Tychele Turner
- Department of Genetics, Washington University in St. Louis, Saint Louis, United States
| | - Cyril J Versoza
- Center for Evolution and Medicine, Arizona State University, Tempe, United States
| | - Richard J Wang
- Department of Biology, Indiana University, Bloomington, United States
| | - Alivia Lee Price
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Ellie Armstrong
- Department of Biology, Stanford University, Stanford, United States
| | - Meritxell Riera
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Jedidiah Carlson
- Department of Genome Sciences, University of Washington, Seattle, United States
| | - Hwei-Yen Chen
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, United States
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, United States
| | | | | | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States
| | - Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, United States
| | - George P Tiley
- Department of Biology, Duke University, Durham, United States
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, United States
| | - Guojie Zhang
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
48
|
Nawaz MS, Einarsson G, Bustamante M, Gisladottir RS, Walters GB, Jonsdottir GA, Skuladottir AT, Bjornsdottir G, Magnusson SH, Asbjornsdottir B, Unnsteinsdottir U, Sigurdsson E, Jonsson PV, Palmadottir VK, Gudjonsson SA, Halldorsson GH, Ferkingstad E, Jonsdottir I, Thorleifsson G, Holm H, Thorsteinsdottir U, Sulem P, Gudbjartsson DF, Stefansson H, Thorgeirsson TE, Ulfarsson MO, Stefansson K. Thirty novel sequence variants impacting human intracranial volume. Brain Commun 2022; 4:fcac271. [PMID: 36415660 PMCID: PMC9677475 DOI: 10.1093/braincomms/fcac271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/16/2022] [Accepted: 10/20/2022] [Indexed: 11/14/2022] Open
Abstract
Intracranial volume, measured through magnetic resonance imaging and/or estimated from head circumference, is heritable and correlates with cognitive traits and several neurological disorders. We performed a genome-wide association study meta-analysis of intracranial volume (n = 79 174) and found 64 associating sequence variants explaining 5.0% of its variance. We used coding variation, transcript and protein levels, to uncover 12 genes likely mediating the effect of these variants, including GLI3 and CDK6 that affect cranial synostosis and microcephaly, respectively. Intracranial volume correlates genetically with volumes of cortical and sub-cortical regions, cognition, learning, neonatal and neurological traits. Parkinson's disease cases have greater and attention deficit hyperactivity disorder cases smaller intracranial volume than controls. Our Mendelian randomization studies indicate that intracranial volume associated variants either increase the risk of Parkinson's disease and decrease the risk of attention deficit hyperactivity disorder and neuroticism or correlate closely with a confounder.
Collapse
Affiliation(s)
- Muhammad Sulaman Nawaz
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Vatnsmyrarvegur 16, 101 Reykjavik, Iceland
| | | | | | - Rosa S Gisladottir
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland.,School of Humanities, University of Iceland, Saemundargata 2, 102 Reykjavik, Iceland
| | - G Bragi Walters
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Vatnsmyrarvegur 16, 101 Reykjavik, Iceland
| | | | | | | | | | | | | | - Engilbert Sigurdsson
- Faculty of Medicine, School of Health Sciences, University of Iceland, Vatnsmyrarvegur 16, 101 Reykjavik, Iceland.,Department of Psychiatry, Landspitali-National University Hospital, Hringbraut 101, 101 Reykjavik, Iceland
| | - Palmi V Jonsson
- Faculty of Medicine, School of Health Sciences, University of Iceland, Vatnsmyrarvegur 16, 101 Reykjavik, Iceland.,Department of Geriatric Medicine, Landspitali University Hospital, Hringbraut 101, 101 Reykjavik, Iceland
| | - Vala Kolbrun Palmadottir
- Department of Internal Medicine, Landspitali University Hospital, Hringbraut 101, 101 Reykjavik, Iceland
| | | | - Gisli H Halldorsson
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland.,School of Engineering and Natural Sciences, University of Iceland, Taeknigardur, Dunhagi 5, 107 Reykjavik, Iceland
| | - Egil Ferkingstad
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland
| | | | | | - Hilma Holm
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland
| | | | - Patrick Sulem
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland
| | | | | | | | - Magnus O Ulfarsson
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland.,Faculty of Electrical and Computer Engineering, University of Iceland, Taeknigardur, Dunhagi 5, 107 Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Sturlugata 8, 102 Reykjavik, Iceland.,Faculty of Medicine, School of Health Sciences, University of Iceland, Vatnsmyrarvegur 16, 101 Reykjavik, Iceland
| |
Collapse
|
49
|
Sirén J, Monlong J, Chang X, Novak AM, Eizenga JM, Markello C, Sibbesen JA, Hickey G, Chang PC, Carroll A, Gupta N, Gabriel S, Blackwell TW, Ratan A, Taylor KD, Rich SS, Rotter JI, Haussler D, Garrison E, Paten B. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 2021; 374:abg8871. [PMID: 34914532 PMCID: PMC9365333 DOI: 10.1126/science.abg8871] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We introduce Giraffe, a pangenome short-read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe maps sequencing reads to thousands of human genomes at a speed comparable to that of standard methods mapping to a single reference genome. The increased mapping accuracy enables downstream improvements in genome-wide genotyping pipelines for both small variants and larger structural variants. We used Giraffe to genotype 167,000 structural variants, discovered in long-read studies, in 5202 diverse human genomes that were sequenced using short reads. We conclude that pangenomics facilitates a more comprehensive characterization of variation and, as a result, has the potential to improve many genomic analyses.
Collapse
Affiliation(s)
- Jouni Sirén
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Xian Chang
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | | | - Glenn Hickey
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Namrata Gupta
- Genomics Platform, Broad Institute, Cambridge, MA, USA
| | - Stacey Gabriel
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | |
Collapse
|
50
|
Lee WP, Tucci AA, Conery M, Leung YY, Kuzma AB, Valladares O, Chou YF, Lu W, Wang LS, Schellenberg GD, Tzeng JY. Copy Number Variation Identification on 3,800 Alzheimer's Disease Whole Genome Sequencing Data from the Alzheimer's Disease Sequencing Project. Front Genet 2021; 12:752390. [PMID: 34804120 PMCID: PMC8599981 DOI: 10.3389/fgene.2021.752390] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
Alzheimer's Disease (AD) is a progressive neurologic disease and the most common form of dementia. While the causes of AD are not completely understood, genetics plays a key role in the etiology of AD, and thus finding genetic factors holds the potential to uncover novel AD mechanisms. For this study, we focus on copy number variation (CNV) detection and burden analysis. Leveraging whole-genome sequence (WGS) data released by Alzheimer's Disease Sequencing Project (ADSP), we developed a scalable bioinformatics pipeline to identify CNVs. This pipeline was applied to 1,737 AD cases and 2,063 cognitively normal controls. As a result, we observed 237,306 and 42,767 deletions and duplications, respectively, with an average of 2,255 deletions and 1,820 duplications per subject. The burden tests show that Non-Hispanic-White cases on average have 16 more duplications than controls do (p-value 2e-6), and Hispanic cases have larger deletions than controls do (p-value 6.8e-5).
Collapse
Affiliation(s)
- Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Albert A. Tucci
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States
| | - Mitchell Conery
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA, United States
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Amanda B. Kuzma
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Otto Valladares
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Yi-Fan Chou
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Gerard D. Schellenberg
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|