1
|
Schulte SC, Dilthey AT, Klau GW. HOGVAX: Exploiting epitope overlaps to maximize population coverage in vaccine design with application to SARS-CoV-2. Cell Syst 2023; 14:1122-1130.e3. [PMID: 38128484 DOI: 10.1016/j.cels.2023.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 09/19/2023] [Accepted: 11/08/2023] [Indexed: 12/23/2023]
Abstract
The efficacy of epitope vaccines depends on the included epitopes as well as the probability that the selected epitopes are presented by the major histocompatibility complex (MHC) proteins of a vaccinated individual. Designing vaccines that effectively immunize a high proportion of the population is challenging because of high MHC polymorphism, diverging MHC-peptide binding affinities, and physical constraints on epitope vaccine constructs. Here, we present HOGVAX, a combinatorial optimization approach for epitope vaccine design. To optimize population coverage within the constraint of limited vaccine construct space, HOGVAX employs a hierarchical overlap graph (HOG) to identify and exploit overlaps between selected peptides and explicitly models the structure of linkage disequilibrium in the MHC. In a SARS-CoV-2 case study, we demonstrate that HOGVAX-designed vaccines contain substantially more epitopes than vaccines built from concatenated peptides and predict vaccine efficacy in over 98% of the population with high numbers of presented peptides in vaccinated individuals.
Collapse
Affiliation(s)
- Sara C Schulte
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, University Clinic Düsseldorf, Düsseldorf, Germany.
| | - Gunnar W Klau
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
2
|
Houwaart T, Scholz S, Pollock NR, Palmer WH, Kichula KM, Strelow D, Le DB, Belick D, Hülse L, Lautwein T, Wachtmeister T, Wollenweber TE, Henrich B, Köhrer K, Parham P, Guethlein LA, Norman PJ, Dilthey AT. Complete sequences of six major histocompatibility complex haplotypes, including all the major MHC class II structures. HLA 2023; 102:28-43. [PMID: 36932816 PMCID: PMC10986641 DOI: 10.1111/tan.15020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 02/10/2023] [Accepted: 02/24/2023] [Indexed: 03/19/2023]
Abstract
Accurate and comprehensive immunogenetic reference panels are key to the successful implementation of population-scale immunogenomics. The 5Mbp Major Histocompatibility Complex (MHC) is the most polymorphic region of the human genome and associated with multiple immune-mediated diseases, transplant matching and therapy responses. Analysis of MHC genetic variation is severely complicated by complex patterns of sequence variation, linkage disequilibrium and a lack of fully resolved MHC reference haplotypes, increasing the risk of spurious findings on analyzing this medically important region. Integrating Illumina, ultra-long Nanopore, and PacBio HiFi sequencing as well as bespoke bioinformatics, we completed five of the alternative MHC reference haplotypes of the current (GRCh38/hg38) build of the human reference genome and added one other. The six assembled MHC haplotypes encompass the DR1 and DR4 haplotype structures in addition to the previously completed DR2 and DR3, as well as six distinct classes of the structurally variable C4 region. Analysis of the assembled haplotypes showed that MHC class II sequence structures, including repeat element positions, are generally conserved within the DR haplotype supergroups, and that sequence diversity peaks in three regions around HLA-A, HLA-B+C, and the HLA class II genes. Demonstrating the potential for improved short-read analysis, the number of proper read pairs recruited to the MHC was found to be increased by 0.06%-0.49% in a 1000 Genomes Project read remapping experiment with seven diverse samples. Furthermore, the assembled haplotypes can serve as references for the community and provide the basis of a structurally accurate genotyping graph of the complete MHC region.
Collapse
Affiliation(s)
- Torsten Houwaart
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Nicholas R. Pollock
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - William H. Palmer
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Katherine M. Kichula
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Daniel Strelow
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Duyen B. Le
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Dana Belick
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Tobias Lautwein
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Thorsten Wachtmeister
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Tassilo E. Wollenweber
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Birgit Henrich
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| | - Karl Köhrer
- Biologisch‐Medizinisches‐Forschungszentrum (BMFZ)Genomics & Transcriptomics Laboratory, Heinrich Heine University DüsseldorfDüsseldorfGermany
| | - Peter Parham
- Department of Structural Biology, and Department of Microbiology and ImmunologyStanford UniversityStanfordCaliforniaUSA
| | - Lisbeth A. Guethlein
- Department of Structural Biology, and Department of Microbiology and ImmunologyStanford UniversityStanfordCaliforniaUSA
| | - Paul J. Norman
- Department of Biomedical InformaticsAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
- Department of Immunology and MicrobiologyAnschutz Medical Campus, University of ColoradoAuroraColoradoUSA
| | - Alexander T. Dilthey
- Institute of Medical Microbiology and Hospital HygieneHeinrich Heine University DüsseldorfDüsseldorfGermany
| |
Collapse
|
3
|
Houwaart T, Belhaj S, Tawalbeh E, Nagels D, Fröhlich Y, Finzer P, Ciruela P, Sabrià A, Herrero M, Andrés C, Antón A, Benmoumene A, Asskali D, Haidar H, von Dahlen J, Nicolai J, Stiller M, Blum J, Lange C, Adelmann C, Schroer B, Osmers U, Grice C, Kirfel PP, Jomaa H, Strelow D, Hülse L, Pigulla M, Kreuzer P, Tyshaieva A, Weber J, Wienemann T, Kohns Vasconcelos M, Hoffmann K, Lübke N, Hauka S, Andree M, Scholz CJ, Jazmati N, Göbels K, Zotz R, Pfeffer K, Timm J, Ehlkes L, Walker A, Dilthey AT. Integrated genomic surveillance enables tracing of person-to-person SARS-CoV-2 transmission chains during community transmission and reveals extensive onward transmission of travel-imported infections, Germany, June to July 2021. Euro Surveill 2022; 27. [PMID: 36305336 PMCID: PMC9615415 DOI: 10.2807/1560-7917.es.2022.27.43.2101089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background Tracking person-to-person SARS-CoV-2 transmission in the population is important to understand the epidemiology of community transmission and may contribute to the containment of SARS-CoV-2. Neither contact tracing nor genomic surveillance alone, however, are typically sufficient to achieve this objective. Aim We demonstrate the successful application of the integrated genomic surveillance (IGS) system of the German city of Düsseldorf for tracing SARS-CoV-2 transmission chains in the population as well as detecting and investigating travel-associated SARS-CoV-2 infection clusters. Methods Genomic surveillance, phylogenetic analysis, and structured case interviews were integrated to elucidate two genetically defined clusters of SARS-CoV-2 isolates detected by IGS in Düsseldorf in July 2021. Results Cluster 1 (n = 67 Düsseldorf cases) and Cluster 2 (n = 36) were detected in a surveillance dataset of 518 high-quality SARS-CoV-2 genomes from Düsseldorf (53% of total cases, sampled mid-June to July 2021). Cluster 1 could be traced back to a complex pattern of transmission in nightlife venues following a putative importation by a SARS-CoV-2-infected return traveller (IP) in late June; 28 SARS-CoV-2 cases could be epidemiologically directly linked to IP. Supported by viral genome data from Spain, Cluster 2 was shown to represent multiple independent introduction events of a viral strain circulating in Catalonia and other European countries, followed by diffuse community transmission in Düsseldorf. Conclusion IGS enabled high-resolution tracing of SARS-CoV-2 transmission in an internationally connected city during community transmission and provided infection chain-level evidence of the downstream propagation of travel-imported SARS-CoV-2 cases.
Collapse
Affiliation(s)
- Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Samir Belhaj
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Emran Tawalbeh
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Dirk Nagels
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Yara Fröhlich
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Patrick Finzer
- Zotz
- Klimas, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Pilar Ciruela
- CIBER Epidemiologia y Salud Pública (CIBERESP), Instituto Salud Carlos III, Madrid, Spain
- Sub-Directorate General of Surveillance and Response to Public Health Emergencies, Public Health Agency of Catalonia, Barcelona, Spain
| | - Aurora Sabrià
- Sub-Directorate General of Surveillance and Response to Public Health Emergencies, Public Health Agency of Catalonia, Barcelona, Spain
| | - Mercè Herrero
- Sub-Directorate General of Surveillance and Response to Public Health Emergencies, Public Health Agency of Catalonia, Barcelona, Spain
| | - Cristina Andrés
- Microbiology Unit, Vall d’Hebron University Hospital, Barcelona, Spain
| | - Andrés Antón
- Microbiology Unit, Vall d’Hebron University Hospital, Barcelona, Spain
| | - Assia Benmoumene
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Dounia Asskali
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Hussein Haidar
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Janina von Dahlen
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jessica Nicolai
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mygg Stiller
- Medizinische Laboratorien Düsseldorf, Düsseldorf, Germany
| | | | | | - Carla Adelmann
- Solingen Health Authority (Gesundheitsamt Solingen), Solingen, Germany
| | - Britta Schroer
- Solingen Health Authority (Gesundheitsamt Solingen), Solingen, Germany
| | - Ute Osmers
- MVZ SYNLAB Leverkusen GmbH, Leverkusen, Germany
| | | | | | | | - Daniel Strelow
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Moritz Pigulla
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Pascal Kreuzer
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Alona Tyshaieva
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jonas Weber
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Tobias Wienemann
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Malte Kohns Vasconcelos
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Nadine Lübke
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Sandra Hauka
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marcel Andree
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | | | - Klaus Göbels
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Rainer Zotz
- Department of Hemostasis and Transfusion Medicine, Heinrich Heine University Medical Center, Düsseldorf, Germany
- Zotz
- Klimas, Düsseldorf, Germany
| | - Klaus Pfeffer
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jörg Timm
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Lutz Ehlkes
- Düsseldorf Health Authority (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Andreas Walker
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Alexander T. Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | |
Collapse
|
4
|
Gliga S, Lübke N, Killer A, Gruell H, Walker A, Dilthey AT, Thielen A, Lohr C, Flaßhove C, Krieg S, Pereira JV, Seraphin TP, Zaufel A, Däumer M, Orth HM, Feldt T, Bode JG, Klein F, Timm J, Luedde T, Jensen BEO. Rapid Selection of Sotrovimab Escape Variants in Severe Acute Respiratory Syndrome Coronavirus 2 Omicron-Infected Immunocompromised Patients. Clin Infect Dis 2022; 76:408-415. [PMID: 36189631 PMCID: PMC9619606 DOI: 10.1093/cid/ciac802] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/30/2022] [Accepted: 09/29/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Monoclonal antibodies (mAbs) that target severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are predominantly less effective against Omicron variants. Immunocompromised patients often experience prolonged viral shedding, resulting in an increased risk of viral escape. METHODS In an observational, prospective cohort, 57 patients infected with Omicron variants who received sotrovimab alone or in combination with remdesivir were followed. The study end points were a decrease in SARS-CoV-2 RNA <106 copies/mL in nasopharyngeal swabs at day 21 and the emergence of escape mutations at days 7, 14, and 21 after sotrovimab administration. All SARS-CoV-2 samples were analyzed using whole-genome sequencing. Individual variants within the quasispecies were subsequently quantified and further characterized using a pseudovirus neutralization assay. RESULTS The majority of patients (43 of 57, 75.4%) were immunodeficient, predominantly due to immunosuppression after organ transplantation or hematologic malignancies. Infections by Omicron/BA.1 comprised 82.5%, while 17.5% were infected by Omicron/BA.2. Twenty-one days after sotrovimab administration, 12 of 43 (27.9%) immunodeficient patients had prolonged viral shedding compared with 1 of 14 (7.1%) immunocompetent patients (P = .011). Viral spike protein mutations, some specific for Omicron (e.g., P337S and/or E340D/V), emerged in 14 of 43 (32.6%) immunodeficient patients, substantially reducing sensitivity to sotrovimab in a pseudovirus neutralization assay. Combination therapy with remdesivir significantly reduced emergence of escape variants. CONCLUSIONS Immunocompromised patients face a considerable risk of prolonged viral shedding and emergence of escape mutations after early therapy with sotrovimab. These findings underscore the importance of careful monitoring and the need for dedicated clinical trials in this patient population.
Collapse
Affiliation(s)
| | - Nadine Lübke
- Correspondence: N. Lübke, Institute of Virology, Medical Faculty, Heinrich-Heine-University Düsseldorf, Moorenstrasse 5, 40225 Düsseldorf, Germany ()
| | | | - Henning Gruell
- Institute of Virology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Andreas Walker
- Institute of Virology, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | | | - Carolin Lohr
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Charlotte Flaßhove
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Sarah Krieg
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Joanna Ventura Pereira
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Tobias Paul Seraphin
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Alex Zaufel
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Martin Däumer
- Institute of Immunology and Genetics, Kaiserslautern, Germany
| | - Hans-Martin Orth
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Torsten Feldt
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Johannes G Bode
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Florian Klein
- Institute of Virology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany,Center for Molecular Medicine Cologne, University of Cologne, Cologne, Germany
| | - Jörg Timm
- Institute of Virology, University Hospital Düsseldorf, Medical Faculty, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | | | | |
Collapse
|
5
|
Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, Mao Y, Korbel JO, Eichler EE, Zody MC, Dilthey AT, Marschall T. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet 2022; 54:518-525. [PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w] [Citation(s) in RCA: 62] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/03/2022] [Indexed: 12/30/2022]
Abstract
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
Collapse
Affiliation(s)
- Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, GeneCore, Heidelberg, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
6
|
Walker A, Houwaart T, Finzer P, Ehlkes L, Tyshaieva A, Damagnez M, Strelow D, Duplessis A, Nicolai J, Wienemann T, Tamayo T, Kohns Vasconcelos M, Hülse L, Hoffmann K, Lübke N, Hauka S, Andree M, Däumer MP, Thielen A, Kolbe-Busch S, Göbels K, Zotz R, Pfeffer K, Timm J, Dilthey AT. Characterization of SARS-CoV-2 infection clusters based on integrated genomic surveillance, outbreak analysis and contact tracing in an urban setting. Clin Infect Dis 2021; 74:1039-1046. [PMID: 34181711 PMCID: PMC8406867 DOI: 10.1093/cid/ciab588] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Indexed: 01/02/2023] Open
Abstract
Background Tracing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission chains is still a major challenge for public health authorities, when incidental contacts are not recalled or are not perceived as potential risk contacts. Viral sequencing can address key questions about SARS-CoV-2 evolution and may support reconstruction of viral transmission networks by integration of molecular epidemiology into classical contact tracing. Methods In collaboration with local public health authorities, we set up an integrated system of genomic surveillance in an urban setting, combining a) viral surveillance sequencing, b) genetically based identification of infection clusters in the population, c) integration of public health authority contact tracing data, and d) a user-friendly dashboard application as a central data analysis platform. Results Application of the integrated system from August to December 2020 enabled a characterization of viral population structure, analysis of 4 outbreaks at a maximum care hospital, and genetically based identification of 5 putative population infection clusters, all of which were confirmed by contact tracing. The system contributed to the development of improved hospital infection control and prevention measures and enabled the identification of previously unrecognized transmission chains, involving a martial arts gym and establishing a link between the hospital to the local population. Conclusions Integrated systems of genomic surveillance could contribute to the monitoring and, potentially, improved management of SARS-CoV-2 transmission in the population.
Collapse
Affiliation(s)
- Andreas Walker
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Patrick Finzer
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,Zotz
- Klimas, Düsseldorf, Germany
| | - Lutz Ehlkes
- Düsseldorf Health Department (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | - Alona Tyshaieva
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Maximilian Damagnez
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Daniel Strelow
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Ashley Duplessis
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jessica Nicolai
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Tobias Wienemann
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Teresa Tamayo
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Malte Kohns Vasconcelos
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Nadine Lübke
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Sandra Hauka
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marcel Andree
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | | | - Susanne Kolbe-Busch
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Klaus Göbels
- Düsseldorf Health Department (Gesundheitsamt Düsseldorf), Düsseldorf, Germany
| | | | - Klaus Pfeffer
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jörg Timm
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany.,Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | | |
Collapse
|
7
|
Ptok J, Müller L, Ostermann PN, Ritchie A, Dilthey AT, Theiss S, Schaal H. Modifying splice site usage with ModCon: Maintaining the genetic code while changing the underlying mRNP code. Comput Struct Biotechnol J 2021; 19:3069-3076. [PMID: 34136105 PMCID: PMC8178101 DOI: 10.1016/j.csbj.2021.05.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 05/14/2021] [Accepted: 05/20/2021] [Indexed: 11/22/2022] Open
Abstract
Codon degeneracy of amino acid sequences permits an additional “mRNP code” layer underlying the genetic code that is related to RNA processing. In pre-mRNA splicing, splice site usage is determined by both intrinsic strength and sequence context providing RNA binding sites for splicing regulatory proteins. In this study, we systematically examined modification of splicing regulatory properties in the neighborhood of a GT site, i.e. potential splice site, without altering the encoded amino acids. We quantified the splicing regulatory properties of the neighborhood around a potential splice site by its Splice Site HEXplorer Weight (SSHW) based on the HEXplorer score algorithm. To systematically modify GT site neighborhoods, either minimizing or maximizing their SSHW, we designed the novel stochastic optimization algorithm ModCon that applies a genetic algorithm with stochastic crossover, insertion and random mutation elements supplemented by a heuristic sliding window approach. To assess the achievable range in SSHW in human splice donors without altering the encoded amino acids, we applied ModCon to a set of 1000 randomly selected Ensembl annotated human splice donor sites, achieving substantial and accurate changes in SSHW. Using ModCon optimization, we successfully switched splice donor usage in a splice site competition reporter containing coding sequences from FANCA, FANCB or BRCA2, while retaining their amino acid coding information. The ModCon algorithm and its R package implementation can assist in reporter design by either introducing novel splice sites, silencing accidental, undesired splice sites, and by generally modifying the entire mRNP code while maintaining the genetic code.
Collapse
Key Words
- A, adenine
- F1, filial sequence 1
- G, guanine
- GA, genetic algorithm
- HBS, HBond score
- HBond score
- HEXplorer score
- HZEI, HEXplorer score
- P1, parental sequence 1
- SA, splice acceptor
- SD, splice donor
- SR proteins, serine- and arginine-rich proteins
- SRP, splicing regulatory protein
- SSHW, splice site HEXplorer weight
- SW, sliding window
- Splice donor
- Splicing regulatory proteins
- Splicing reporter
- T, thymine
- eGFP, enhanced green fluorescent protein
- hnRNP, heterogeneous nuclear ribonucleoproteins
- nt, nucleotides
- pre-mRNA splicing
- pre-mRNA, precursor messenger RNA
- snRNA, small nuclear RNA
Collapse
Affiliation(s)
- Johannes Ptok
- Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany
| | - Lisa Müller
- Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany
| | - Philipp Niklas Ostermann
- Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany
| | - Anastasia Ritchie
- Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Stephan Theiss
- Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany
| | - Heiner Schaal
- Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany
| |
Collapse
|
8
|
Abstract
The Major Histocompatibility Complex (MHC) on the short arm of chromosome 6 is associated with more diseases than any other region of the genome; it encodes the antigen-presenting Human Leukocyte Antigen (HLA) proteins and is one of the key immunogenetic regions of the genome. Accurate genome inference and interpretation of MHC association signals have traditionally been hampered by the region's uniquely complex features, such as high levels of polymorphism; inter-gene sequence homologies; structural variation; and long-range haplotype structures. Recent algorithmic and technological advances have, however, significantly increased the accessibility of genetic variation in the MHC; these developments include (i) accurate SNP-based HLA type imputation; (ii) genome graph approaches for variation-aware genome inference from next-generation sequencing data; (iii) long-read-based diploid de novo assembly of the MHC; (iv) cost-effective targeted MHC sequencing methods. Applied to hundreds of thousands of samples over the last years, these technologies have already enabled significant biological discoveries, for example in the field of autoimmune disease genetics. Remaining challenges concern the development of integrated methods that leverage haplotype-resolved de novo assembly of the MHC for the development of improved MHC genotyping methods for short reads and the construction of improved reference panels for SNP-based imputation. Improved genome inference in the MHC can crucially contribute to an improved genetic and functional understanding of many immune-related phenotypes and diseases.
Collapse
Affiliation(s)
- Alexander T Dilthey
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany; Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany; Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
9
|
Henrich B, Hammerlage S, Scharf S, Haberhausen D, Fürnkranz U, Köhrer K, Peitzmann L, Fiori PL, Spergser J, Pfeffer K, Dilthey AT. Characterisation of mobile genetic elements in Mycoplasma hominis with the description of ICEHo-II, a variant mycoplasma integrative and conjugative element. Mob DNA 2020; 11:30. [PMID: 33292499 PMCID: PMC7648426 DOI: 10.1186/s13100-020-00225-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 10/22/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Mobile genetic elements are found in genomes throughout the microbial world, mediating genome plasticity and important prokaryotic phenotypes. Even the cell wall-less mycoplasmas, which are known to harbour a minimal set of genes, seem to accumulate mobile genetic elements. In Mycoplasma hominis, a facultative pathogen of the human urogenital tract and an inherently very heterogeneous species, four different MGE-classes had been detected until now: insertion sequence ISMhom-1, prophage MHoV-1, a tetracycline resistance mediating transposon, and ICEHo, a species-specific variant of a mycoplasma integrative and conjugative element encoding a T4SS secretion system (termed MICE). RESULTS To characterize the prevalence of these MGEs, genomes of 23 M. hominis isolates were assembled using whole genome sequencing and bioinformatically analysed for the presence of mobile genetic elements. In addition to the previously described MGEs, a new ICEHo variant was found, which we designate ICEHo-II. Of 15 ICEHo-II genes, five are common MICE genes; eight are unique to ICEHo-II; and two represent a duplication of a gene also present in ICEHo-I. In 150 M. hominis isolates and based on a screening PCR, prevalence of ICEHo-I was 40.7%; of ICEHo-II, 28.7%; and of both elements, 15.3%. Activity of ICEHo-I and -II was demonstrated by detection of circularized extrachromosomal forms of the elements through PCR and subsequent Sanger sequencing. CONCLUSIONS Nanopore sequencing enabled the identification of mobile genetic elements and of ICEHo-II, a novel MICE element of M. hominis, whose phenotypic impact and potential impact on pathogenicity can now be elucidated.
Collapse
Affiliation(s)
- Birgit Henrich
- Institute of Med. Microbiology and Hospital Hygiene of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany.
| | - Stephanie Hammerlage
- Institute of Med. Microbiology and Hospital Hygiene of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany
| | - Sebastian Scharf
- Institute of Med. Microbiology and Hospital Hygiene of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany.,Department of Haematology, Oncology and Clinical Immunology, Medical Faculty, University of Duesseldorf, Duesseldorf, Germany
| | - Diana Haberhausen
- Institute of Med. Microbiology and Hospital Hygiene of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany
| | - Ursula Fürnkranz
- Institute for Specific Prophylaxis and Tropical Medicine, Centre for Pathophysiology, Immunology and Infectiology, Medical University of Vienna, Vienna, Austria
| | - Karl Köhrer
- Biological and Medical Research Centre (BMFZ) of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany
| | - Lena Peitzmann
- Biological and Medical Research Centre (BMFZ) of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany
| | - Pier Luigi Fiori
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Joachim Spergser
- Institute of Microbiology, University of Veterinary Medicine Vienna, Vienna, Austria
| | - Klaus Pfeffer
- Institute of Med. Microbiology and Hospital Hygiene of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany
| | - Alexander T Dilthey
- Institute of Med. Microbiology and Hospital Hygiene of the Heinrich-Heine-University Duesseldorf, Duesseldorf, Germany.,Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany.,Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| |
Collapse
|
10
|
Chin CS, Wagner J, Zeng Q, Garrison E, Garg S, Fungtammasan A, Rautiainen M, Aganezov S, Kirsche M, Zarate S, Schatz MC, Xiao C, Rowell WJ, Markello C, Farek J, Sedlazeck FJ, Bansal V, Yoo B, Miller N, Zhou X, Carroll A, Barrio AM, Salit M, Marschall T, Dilthey AT, Zook JM. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat Commun 2020; 11:4794. [PMID: 32963235 PMCID: PMC7508831 DOI: 10.1038/s41467-020-18564-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 08/27/2020] [Indexed: 01/20/2023] Open
Abstract
Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.
Collapse
Affiliation(s)
- Chen-Shan Chin
- DNAnexus, Inc, 1975 W El Camino Real, Suite 204, Mountain View, CA, 94040, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD, 20899, USA
| | - Qiandong Zeng
- Laboratory Corporation of America Holdings, 3400 Computer Drive, Westborough, MA, 01581, USA
| | - Erik Garrison
- University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, 95064, USA
| | - Shilpa Garg
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123, Saarbrücken, Germany
- Saarland Graduate School for Computer Science, Saarland Informatics Campus E1.3, 66123, Saarbrücken, Germany
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY, 11724, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | | | - Charles Markello
- University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, 95064, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Vikas Bansal
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Byunggil Yoo
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, 64108, USA
| | - Neil Miller
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, 64108, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, 94043, USA
| | | | - Marc Salit
- Joint Initiative for Metrology in Biology, Stanford, CA, 94305, USA
| | - Tobias Marschall
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD, 20899, USA.
| |
Collapse
|
11
|
Sallah N, Miley W, Labo N, Carstensen T, Fatumo S, Gurdasani D, Pollard MO, Dilthey AT, Mentzer AJ, Marshall V, Cornejo Castro EM, Pomilla C, Young EH, Asiki G, Hibberd ML, Sandhu M, Kellam P, Newton R, Whitby D, Barroso I. Distinct genetic architectures and environmental factors associate with host response to the γ2-herpesvirus infections. Nat Commun 2020; 11:3849. [PMID: 32737300 PMCID: PMC7395761 DOI: 10.1038/s41467-020-17696-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 07/13/2020] [Indexed: 01/05/2023] Open
Abstract
Kaposi's sarcoma-associated herpesvirus (KSHV) and Epstein-Barr Virus (EBV) establish life-long infections and are associated with malignancies. Striking geographic variation in incidence and the fact that virus alone is insufficient to cause disease, suggests other co-factors are involved. Here we present epidemiological analysis and genome-wide association study (GWAS) in 4365 individuals from an African population cohort, to assess the influence of host genetic and non-genetic factors on virus antibody responses. EBV/KSHV co-infection (OR = 5.71(1.58-7.12)), HIV positivity (OR = 2.22(1.32-3.73)) and living in a more rural area (OR = 1.38(1.01-1.89)) are strongly associated with immunogenicity. GWAS reveals associations with KSHV antibody response in the HLA-B/C region (p = 6.64 × 10-09). For EBV, associations are identified for VCA (rs71542439, p = 1.15 × 10-12). Human leucocyte antigen (HLA) and trans-ancestry fine-mapping substantiate that distinct variants in HLA-DQA1 (p = 5.24 × 10-44) are driving associations for EBNA-1 in Africa. This study highlights complex interactions between KSHV and EBV, in addition to distinct genetic architectures resulting in important differences in pathogenesis and transmission.
Collapse
Affiliation(s)
- Neneh Sallah
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK. .,London School of Hygiene & Tropical Medicine, London, UK. .,London School of Hygiene & Tropical Medicine, London, UK.
| | - Wendell Miley
- Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Nazzarena Labo
- Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Tommy Carstensen
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.,Department of Medicine, University of Cambridge, Cambridge, UK
| | - Segun Fatumo
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.,London School of Hygiene & Tropical Medicine, London, UK.,MRC/UVRI at the London School of Hygiene & Tropical Medicine, Entebbe, Uganda
| | - Deepti Gurdasani
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.,Queen Mary University London, London, UK
| | - Martin O Pollard
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.,Department of Medicine, University of Cambridge, Cambridge, UK
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Alexander J Mentzer
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.,Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Vickie Marshall
- Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Elena M Cornejo Castro
- Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Cristina Pomilla
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.,Department of Medicine, University of Cambridge, Cambridge, UK
| | - Elizabeth H Young
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK.,Department of Medicine, University of Cambridge, Cambridge, UK
| | - Gershim Asiki
- African Population and Health Research Center, Nairobi, Kenya
| | | | | | - Paul Kellam
- Department of Infectious Diseases, Imperial College London, London, UK.,Kymab Ltd, Babraham Research Complex, Cambridge, UK
| | - Robert Newton
- MRC/UVRI at the London School of Hygiene & Tropical Medicine, Entebbe, Uganda
| | - Denise Whitby
- Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Inês Barroso
- The Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK. .,MRC Epidemiology Unit, University of Cambridge, Cambridge, UK. .,Exeter Centre of ExcEllence in Diabetes (ExCEED), University of Exeter Medical School, Exeter, UK.
| |
Collapse
|
12
|
Dilthey AT, Mentzer AJ, Carapito R, Cutland C, Cereb N, Madhi SA, Rhie A, Koren S, Bahram S, McVean G, Phillippy AM. HLA*LA-HLA typing from linearly projected graph alignments. Bioinformatics 2020; 35:4394-4396. [PMID: 30942877 PMCID: PMC6821427 DOI: 10.1093/bioinformatics/btz235] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 02/26/2019] [Accepted: 04/02/2019] [Indexed: 11/13/2022] Open
Abstract
Summary HLA*LA implements a new graph alignment model for human leukocyte antigen (HLA) type inference, based on the projection of linear alignments onto a variation graph. It enables accurate HLA type inference from whole-genome (99% accuracy) and whole-exome (93% accuracy) Illumina data; from long-read Oxford Nanopore and Pacific Biosciences data (98% accuracy for whole-genome and targeted data) and from genome assemblies. Computational requirements for a typical sample vary between 0.7 and 14 CPU hours per sample. Availability and implementation HLA*LA is implemented in C++ and Perl and freely available as a bioconda package or from https://github.com/DiltheyLab/HLA-LA (GPL v3). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander T Dilthey
- Institute of Medical Microbiology, University Hospital of Dusseldorf, Dusseldorf, North Rhine-Westphalia, Germany.,Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.,Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Alexander J Mentzer
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.,Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Raphael Carapito
- Laboratoire d'ImmunoRhumatologie Moléculaire, Plateforme GENOMAX, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Faculté de Médecine, Université de Strasbourg, France.,Service d'Immunologie Biologique, Nouvel Hôpital Civil, Strasbourg, France
| | - Clare Cutland
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Department of Science/National Research Foundation: Vaccine Preventable Diseases, Faculty of Health Science, University of the Witwatersrand, Johannesburg, South Africa
| | | | - Shabir A Madhi
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Department of Science/National Research Foundation: Vaccine Preventable Diseases, Faculty of Health Science, University of the Witwatersrand, Johannesburg, South Africa
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Seiamak Bahram
- Laboratoire d'ImmunoRhumatologie Moléculaire, Plateforme GENOMAX, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Faculté de Médecine, Université de Strasbourg, France.,Service d'Immunologie Biologique, Nouvel Hôpital Civil, Strasbourg, France
| | - Gil McVean
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.,Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| |
Collapse
|
13
|
Walker A, Houwaart T, Wienemann T, Vasconcelos MK, Strelow D, Senff T, Hülse L, Adams O, Andree M, Hauka S, Feldt T, Jensen BE, Keitel V, Kindgen-Milles D, Timm J, Pfeffer K, Dilthey AT. Genetic structure of SARS-CoV-2 reflects clonal superspreading and multiple independent introduction events, North-Rhine Westphalia, Germany, February and March 2020. ACTA ACUST UNITED AC 2020; 25. [PMID: 32524946 PMCID: PMC7336109 DOI: 10.2807/1560-7917.es.2020.25.22.2000746] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
We whole-genome sequenced 55 SARS-CoV-2 isolates from Germany to investigate SARS-CoV-2 outbreaks in 2020 in the Heinsberg district and Düsseldorf. While the genetic structure of the Heinsberg outbreak indicates a clonal origin, reflecting superspreading dynamics from mid-February during the carnival season, distinct viral strains were circulating in Düsseldorf in March, reflecting the city’s international links. Limited detection of Heinsberg strains in the Düsseldorf area despite geographical proximity may reflect efficient containment and contact-tracing efforts.
Collapse
Affiliation(s)
- Andreas Walker
- These authors contributed equally.,Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,These authors contributed equally
| | - Tobias Wienemann
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Malte Kohns Vasconcelos
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Daniel Strelow
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Tina Senff
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Lisanna Hülse
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Ortwin Adams
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marcel Andree
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Sandra Hauka
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Torsten Feldt
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Björn-Erik Jensen
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Verena Keitel
- Department of Gastroenterology, Hepatology and Infectious Diseases, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Detlef Kindgen-Milles
- Department of Anaesthesiology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jörg Timm
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Klaus Pfeffer
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
14
|
Dilthey AT, Meyer SA, Kaasch AJ. Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing. Genome Biol 2020; 21:68. [PMID: 32171299 PMCID: PMC7071681 DOI: 10.1186/s13059-020-01974-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 02/24/2020] [Indexed: 01/10/2023] Open
Abstract
Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequencing cost and hands-on time of Nanopore sequencing for hybrid assembly by at least 50% compared to molecular barcoding while maintaining high assembly quality. Ultraplexing requires the availability of Illumina data and uses inter-sample genetic variability to assign reads to isolates, which obviates the need for molecular barcoding. Thus, Ultraplexing can enable significant sequencing and labor cost reductions in large-scale bacterial genome projects.
Collapse
Affiliation(s)
- Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany. .,Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA.
| | - Sebastian A Meyer
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Achim J Kaasch
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany. .,Institute of Medical Microbiology and Hospital Hygiene, University Hospital, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.
| |
Collapse
|
15
|
Dilthey AT, Jain C, Koren S, Phillippy AM. Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat Commun 2019; 10:3066. [PMID: 31296857 PMCID: PMC6624308 DOI: 10.1038/s41467-019-10934-2] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 06/11/2019] [Indexed: 12/20/2022] Open
Abstract
Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16 GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and EM-based estimation of sample composition, MetaMaps achieves >94% accuracy for species-level read assignment and r2 > 0.97 for the estimation of sample composition on both simulated and real data when the sample genomes or close relatives are present in the classification database. To address novel species and genera, which are comparatively harder to predict, MetaMaps outputs mapping locations and qualities for all classified reads, enabling functional studies (e.g. gene presence/absence) and detection of incongruities between sample and reference genomes. Sequencing platforms, such as Oxford Nanopore or Pacific Biosciences generate long-read data that preserve long-range genomic information but have high error rates. Here, the authors develop MetaMaps, a computational tool for strain-level metagenomic assignment and compositional estimation using long reads.
Collapse
Affiliation(s)
- Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich-Heine-University Düsseldorf, Düsseldorf, North Rhine-Westphalia, Germany. .,Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA.
| | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA.,Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA
| |
Collapse
|
16
|
Hanscombe KB, Morris DL, Noble JA, Dilthey AT, Tombleson P, Kaufman KM, Comeau M, Langefeld CD, Alarcon-Riquelme ME, Gaffney PM, Jacob CO, Sivils KL, Tsao BP, Alarcon GS, Brown EE, Croker J, Edberg J, Gilkeson G, James JA, Kamen DL, Kelly JA, McCune J, Merrill JT, Petri M, Ramsey-Goldman R, Reveille JD, Salmon JE, Scofield H, Utset T, Wallace DJ, Weisman MH, Kimberly RP, Harley JB, Lewis CM, Criswell LA, Vyse TJ. Genetic fine mapping of systemic lupus erythematosus MHC associations in Europeans and African Americans. Hum Mol Genet 2019; 27:3813-3824. [PMID: 30085094 PMCID: PMC6196648 DOI: 10.1093/hmg/ddy280] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/24/2018] [Indexed: 11/14/2022] Open
Abstract
Genetic variation within the major histocompatibility complex (MHC) contributes substantial risk for systemic lupus erythematosus, but high gene density, extreme polymorphism and extensive linkage disequilibrium (LD) have made fine mapping challenging. To address the problem, we compared two association techniques in two ancestrally diverse populations, African Americans (AAs) and Europeans (EURs). We observed a greater number of Human Leucocyte Antigen (HLA) alleles in AA consistent with the elevated level of recombination in this population. In EUR we observed 50 different A-C-B-DRB1-DQA-DQB multilocus haplotype sequences per hundred individuals; in the AA sample, these multilocus haplotypes were twice as common compared to Europeans. We also observed a strong narrow class II signal in AA as opposed to the long-range LD observed in EUR that includes class I alleles. We performed a Bayesian model choice of the classical HLA alleles and a frequentist analysis that combined both single nucleotide polymorphisms (SNPs) and classical HLA alleles. Both analyses converged on a similar subset of risk HLA alleles: in EUR HLA- B*08:01 + B*18:01 + (DRB1*15:01 frequentist only) + DQA*01:02 + DQB*02:01 + DRB3*02 and in AA HLA-C*17:01 + B*08:01 + DRB1*15:03 + (DQA*01:02 frequentist only) + DQA*02:01 + DQA*05:01+ DQA*05:05 + DQB*03:19 + DQB*02:02. We observed two additional independent SNP associations in both populations: EUR rs146903072 and rs501480; AA rs389883 and rs114118665. The DR2 serotype was best explained by DRB1*15:03 + DQA*01:02 in AA and by DRB1*15:01 + DQA*01:02 in EUR. The DR3 serotype was best explained by DQA*05:01 in AA and by DQB*02:01 in EUR. Despite some differences in underlying HLA allele risk models in EUR and AA, SNP signals across the extended MHC showed remarkable similarity and significant concordance in direction of effect for risk-associated variants.
Collapse
Affiliation(s)
- Ken B Hanscombe
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | - David L Morris
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | - Janelle A Noble
- CHORI, Children's Hospital Oakland Research Institute, Oakland, California, USA
| | | | - Philip Tombleson
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | - Kenneth M Kaufman
- Center for Autoimmune Genomics and Etiology (CAGE), Department of Pediatrics, Cincinnati Children's Medical Center & University of Cincinnati and the US Department of Veterans Affairs Medical Center, Cincinnati, OH, USA
| | - Mary Comeau
- Center for Public Health Genomics, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Carl D Langefeld
- Center for Public Health Genomics, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Marta E Alarcon-Riquelme
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO), Granada, Spain.,Unit of Chronic Inflammation, Institute of Environmental Medicine, Karolinska Institute, Sweden
| | - Patrick M Gaffney
- Arthritis & Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Chaim O Jacob
- Keck School of Medicine of USC, Los Angeles, CA, USA
| | - Kathy L Sivils
- Arthritis & Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Betty P Tsao
- Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
| | - Graciela S Alarcon
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Elizabeth E Brown
- Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jennifer Croker
- Center for Clinical and Translational Science, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jeff Edberg
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Gary Gilkeson
- Division of Rheumatology, Medical University of South Carolina, Charleston, SC, USA
| | - Judith A James
- Arthritis & Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA.,Division of Rheumatology, Cedars Sinai Medical Center, Los Angeles, CA, USA
| | - Diane L Kamen
- Division of Rheumatology, Medical University of South Carolina, Charleston, SC, USA
| | - Jennifer A Kelly
- Arthritis & Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Joseph McCune
- Michigan Medicine Rheumatology Clinic,Taubman Center Floor 3 Reception A, 1500 E Medical Center Dr SPC 5358, Ann Arbor, MI, USA
| | - Joan T Merrill
- Oklahoma Medical Research Foundation,825 N.E. 13th Street, Oklahoma City, OK, USA
| | - Michelle Petri
- Division of Rheumatology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - John D Reveille
- Department of Internal Medicine, The University of Texas, Fannin, MSB, Houston, TX, USA
| | - Jane E Salmon
- Division of Rheumatology, Hospital for Special Surgery-Weill Cornell Medicine, New York, NY, USA
| | - Hal Scofield
- Arthritis & Clinical Immunology Research Program, Division of Genomics and Data Sciences, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA.,Oklahoma Clinical and Translational Science Institute,University of Oklahoma Health Sciences Center, 920 NE Stanton L. Young, Oklahoma City, OK, USA
| | - Tammy Utset
- University of Chicago Pritzker School of Medicine, Chicago, IL, USA
| | - Daniel J Wallace
- Division of Rheumatology, Cedars Sinai Medical Center, Los Angeles, CA, USA
| | - Michael H Weisman
- Division of Rheumatology, Cedars Sinai Medical Center, Los Angeles, CA, USA
| | - Robert P Kimberly
- Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - John B Harley
- Center for Autoimmune Genomics and Etiology (CAGE), Department of Pediatrics, Cincinnati Children's Medical Center & University of Cincinnati and the US Department of Veterans Affairs Medical Center, Cincinnati, OH, USA
| | - Cathryn M Lewis
- Department of Medical and Molecular Genetics, King's College London, London, UK.,MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Lindsey A Criswell
- Rosalind Russell / Ephraim P Engleman Rheumatology Research Center, Division of Rheumatology, UCSF School of Medicine, San Francisco, CA, USA
| | - Timothy J Vyse
- Department of Medical and Molecular Genetics, King's College London, London, UK
| |
Collapse
|
17
|
Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol 2018; 36:nbt.4277. [PMID: 30346939 PMCID: PMC6476705 DOI: 10.1038/nbt.4277] [Citation(s) in RCA: 239] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 09/10/2018] [Indexed: 12/20/2022]
Abstract
Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Alexander T. Dilthey
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
- Institute of Medical Microbiology, Heinrich-Heine-University Düsseldorf, Düsseldorf, North Rhine-Westphalia, Germany
| | - Derek M. Bickhart
- Cell Wall Biology and Utilization Laboratory, ARS USDA, Madison, Wisconsin, USA
| | | | - Stefan Hiendleder
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy SA, Australia
- Robinson Research Institute, The University of Adelaide, Adelaide SA, Australia
| | - John L. Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy SA, Australia
| | | | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA
| |
Collapse
|
18
|
Biederstedt E, Oliver JC, Hansen NF, Jajoo A, Dunn N, Olson A, Busby B, Dilthey AT. NovoGraph: Genome graph construction from multiple long-read de novo assemblies. F1000Res 2018; 7:1391. [PMID: 30613392 PMCID: PMC6305223 DOI: 10.12688/f1000research.15895.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/14/2018] [Indexed: 10/06/2023] Open
Abstract
Genome graphs are emerging as an important novel approach to the analysis of high-throughput sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and uses a simple criterion of homologous-identical recombination to convert the multiple sequence alignment into a graph. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.
Collapse
Affiliation(s)
- Evan Biederstedt
- Weill Cornell Medicine, New York, NY, 10065, USA
- New York Genome Center, New York, NY, 10013, USA
| | - Jeffrey C. Oliver
- Office of Digital Innovation and Stewardship, University Libraries, University of Arizona, Tucson, AZ, 85721, USA
| | - Nancy F. Hansen
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20817, USA
| | - Aarti Jajoo
- Baylor College of Medicine, Houston, TX, 77030, USA
| | - Nathan Dunn
- Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Ben Busby
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20817, USA
| | - Alexander T. Dilthey
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20817, USA
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, 40225, Germany
| |
Collapse
|
19
|
Biederstedt E, Oliver JC, Hansen NF, Jajoo A, Dunn N, Olson A, Busby B, Dilthey AT. NovoGraph: Human genome graph construction from multiple long-read de novo assemblies. F1000Res 2018; 7:1391. [PMID: 30613392 PMCID: PMC6305223 DOI: 10.12688/f1000research.15895.2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/03/2018] [Indexed: 01/06/2023] Open
Abstract
Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.
Collapse
Affiliation(s)
- Evan Biederstedt
- Weill Cornell Medicine, New York, NY, 10065, USA
- New York Genome Center, New York, NY, 10013, USA
| | - Jeffrey C. Oliver
- Office of Digital Innovation and Stewardship, University Libraries, University of Arizona, Tucson, AZ, 85721, USA
| | - Nancy F. Hansen
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20817, USA
| | - Aarti Jajoo
- Baylor College of Medicine, Houston, TX, 77030, USA
| | - Nathan Dunn
- Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Ben Busby
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20817, USA
| | - Alexander T. Dilthey
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20817, USA
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, 40225, Germany
| |
Collapse
|
20
|
Kim JH, Dilthey AT, Nagaraja R, Lee HS, Koren S, Dudekula D, Wood Iii WH, Piao Y, Ogurtsov AY, Utani K, Noskov VN, Shabalina SA, Schlessinger D, Phillippy AM, Larionov V. Variation in human chromosome 21 ribosomal RNA genes characterized by TAR cloning and long-read sequencing. Nucleic Acids Res 2018; 46:6712-6725. [PMID: 29788454 PMCID: PMC6061828 DOI: 10.1093/nar/gky442] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 05/08/2018] [Indexed: 12/31/2022] Open
Abstract
Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ∼0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.
Collapse
MESH Headings
- Animals
- Cell Line
- Chromosomes, Human, Pair 21
- Cloning, Molecular
- DNA, Ribosomal/chemistry
- DNA, Ribosomal/isolation & purification
- DNA, Ribosomal Spacer/chemistry
- Genes, rRNA
- Genetic Variation
- Humans
- Mice
- Nucleic Acid Conformation
- Nucleolus Organizer Region/chemistry
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/metabolism
- Sequence Analysis, DNA
Collapse
Affiliation(s)
- Jung-Hyun Kim
- National Cancer Institute, Developmental Therapeutics Branch, Bethesda, MD 20892, USA
| | - Alexander T Dilthey
- National Human Genome Research Institute, Computational and Statistical Genomics Branch, Bethesda, MD 20892, USA
| | - Ramaiah Nagaraja
- National Institute on Aging, Laboratory of Genetics and Genomics, Baltimore, MD 21224, USA
| | - Hee-Sheung Lee
- National Cancer Institute, Developmental Therapeutics Branch, Bethesda, MD 20892, USA
| | - Sergey Koren
- National Human Genome Research Institute, Computational and Statistical Genomics Branch, Bethesda, MD 20892, USA
| | - Dawood Dudekula
- National Institute on Aging, Laboratory of Genetics and Genomics, Baltimore, MD 21224, USA
| | - William H Wood Iii
- National Institute on Aging, Laboratory of Genetics and Genomics, Baltimore, MD 21224, USA
| | - Yulan Piao
- National Institute on Aging, Laboratory of Genetics and Genomics, Baltimore, MD 21224, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20892, USA
| | - Koichi Utani
- National Cancer Institute, Developmental Therapeutics Branch, Bethesda, MD 20892, USA
| | - Vladimir N Noskov
- National Cancer Institute, Developmental Therapeutics Branch, Bethesda, MD 20892, USA
| | - Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20892, USA
| | - David Schlessinger
- National Institute on Aging, Laboratory of Genetics and Genomics, Baltimore, MD 21224, USA
| | - Adam M Phillippy
- National Human Genome Research Institute, Computational and Statistical Genomics Branch, Bethesda, MD 20892, USA
| | - Vladimir Larionov
- National Cancer Institute, Developmental Therapeutics Branch, Bethesda, MD 20892, USA
| |
Collapse
|
21
|
Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, Malla S, Marriott H, Nieto T, O'Grady J, Olsen HE, Pedersen BS, Rhie A, Richardson H, Quinlan AR, Snutch TP, Tee L, Paten B, Phillippy AM, Simpson JT, Loman NJ, Loose M. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 2018; 36:338-345. [PMID: 29431738 PMCID: PMC5889714 DOI: 10.1038/nbt.4060] [Citation(s) in RCA: 996] [Impact Index Per Article: 166.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 12/11/2017] [Indexed: 02/07/2023]
Abstract
We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ∼3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
Collapse
Affiliation(s)
- Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Josh Quick
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Arthur C Rand
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, Salt Lake City, Utah USA
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah USA
| | - John R Tyson
- Michael Smith Laboratories and Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada
| | - Andrew D Beggs
- Surgical Research Laboratory, Institute of Cancer & Genomic Science, University of Birmingham, UK
| | - Alexander T Dilthey
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Sunir Malla
- DeepSeq, School of Life Sciences, University of Nottingham, UK
| | - Hannah Marriott
- DeepSeq, School of Life Sciences, University of Nottingham, UK
| | - Tom Nieto
- Surgical Research Laboratory, Institute of Cancer & Genomic Science, University of Birmingham, UK
| | - Justin O'Grady
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - Hugh E Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, Utah USA
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland USA
| | | | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, Utah USA
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah USA
| | - Terrance P Snutch
- Michael Smith Laboratories and Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada
| | - Louise Tee
- Surgical Research Laboratory, Institute of Cancer & Genomic Science, University of Birmingham, UK
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland USA
| | - Jared T Simpson
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Nicholas J Loman
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Matthew Loose
- DeepSeq, School of Life Sciences, University of Nottingham, UK
| |
Collapse
|
22
|
Dilthey AT, Gourraud PA, Mentzer AJ, Cereb N, Iqbal Z, McVean G. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs. PLoS Comput Biol 2016; 12:e1005151. [PMID: 27792722 PMCID: PMC5085092 DOI: 10.1371/journal.pcbi.1005151] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 09/18/2016] [Indexed: 01/04/2023] Open
Abstract
Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant challenge to practical application. Determining an individual’s HLA type (the sequence of the exons of the HLA genes) is important in many areas of biomedical research. For example, HLA types shape immune epitope repertoires, which are relevant in cancer immunotherapy, and influence autoimmune and infectious disease risk. Whole-genome sequencing data, currently being generated for hundreds of thousands of individuals, contains the information necessary for HLA typing–but inferring accurate HLA types from these is a challenging problem. First, the HLA genes are the most polymorphic genes in the human genome; second, these genes and their variant alleles exhibit high degrees of sequence similarity (due to a shared evolutionary origin). This makes it difficult to establish which specific HLA gene a given observed sequencing read derives from. We show that this problem can be addressed using a Population Reference Graph (PRG): for each gene, the PRG contains not only the reference sequence but also variant alleles, thus enabling, using a novel sequence-to-graph mapping algorithm, the accurate mapping of reads to HLA genes. We also show that HLA*PRG, the algorithm implementing our approach, achieves–based on standard whole-genome sequencing data–accuracies comparable to those of specialized gold-standard methods. HLA*PRG is open source and freely available.
Collapse
Affiliation(s)
- Alexander T. Dilthey
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- NHGRI-NIH, Bethesda, MD, United States of America
- * E-mail:
| | - Pierre-Antoine Gourraud
- Neurology Department, UCSF, San Francisco, United States of America
- Inserm unit 1064 ATIP-Avenir team 6, University of Nantes–Nantes University Hospitals, Nantes, France
| | - Alexander J. Mentzer
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Nezih Cereb
- Histogenetics, Ossining, United States of America
| | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
23
|
Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C, Shah TS, Spencer C, Booth D, Goris A, Oturai A, Saarela J, Fontaine B, Hemmer B, Martin C, Zipp F, D'Alfonso S, Martinelli-Boneschi F, Taylor B, Harbo HF, Kockum I, Hillert J, Olsson T, Ban M, Oksenberg JR, Hintzen R, Barcellos LF, Agliardi C, Alfredsson L, Alizadeh M, Anderson C, Andrews R, Søndergaard HB, Baker A, Band G, Baranzini SE, Barizzone N, Barrett J, Bellenguez C, Bergamaschi L, Bernardinelli L, Berthele A, Biberacher V, Binder TMC, Blackburn H, Bomfim IL, Brambilla P, Broadley S, Brochet B, Brundin L, Buck D, Butzkueven H, Caillier SJ, Camu W, Carpentier W, Cavalla P, Celius EG, Coman I, Comi G, Corrado L, Cosemans L, Cournu-Rebeix I, Cree BAC, Cusi D, Damotte V, Defer G, Delgado SR, Deloukas P, di Sapio A, Dilthey AT, Donnelly P, Dubois B, Duddy M, Edkins S, Elovaara I, Esposito F, Evangelou N, Fiddes B, Field J, Franke A, Freeman C, Frohlich IY, Galimberti D, Gieger C, Gourraud PA, Graetz C, Graham A, Grummel V, Guaschino C, Hadjixenofontos A, Hakonarson H, Halfpenny C, Hall G, Hall P, Hamsten A, Harley J, Harrower T, Hawkins C, Hellenthal G, Hillier C, Hobart J, Hoshi M, Hunt SE, Jagodic M, Jelčić I, Jochim A, Kendall B, Kermode A, Kilpatrick T, Koivisto K, Konidari I, Korn T, Kronsbein H, Langford C, Larsson M, Lathrop M, Lebrun-Frenay C, Lechner-Scott J, Lee MH, Leone MA, Leppä V, Liberatore G, Lie BA, Lill CM, Lindén M, Link J, Luessi F, Lycke J, Macciardi F, Männistö S, Manrique CP, Martin R, Martinelli V, Mason D, Mazibrada G, McCabe C, Mero IL, Mescheriakova J, Moutsianas L, Myhr KM, Nagels G, Nicholas R, Nilsson P, Piehl F, Pirinen M, Price SE, Quach H, Reunanen M, Robberecht W, Robertson NP, Rodegher M, Rog D, Salvetti M, Schnetz-Boutaud NC, Sellebjerg F, Selter RC, Schaefer C, Shaunak S, Shen L, Shields S, Siffrin V, Slee M, Sorensen PS, Sorosina M, Sospedra M, Spurkland A, Strange A, Sundqvist E, Thijs V, Thorpe J, Ticca A, Tienari P, van Duijn C, Visser EM, Vucic S, Westerlind H, Wiley JS, Wilkins A, Wilson JF, Winkelmann J, Zajicek J, Zindler E, Haines JL, Pericak-Vance MA, Ivinson AJ, Stewart G, Hafler D, Hauser SL, Compston A, McVean G, De Jager P, Sawcer SJ, McCauley JL. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet 2013; 45:1353-60. [PMID: 24076602 PMCID: PMC3832895 DOI: 10.1038/ng.2770] [Citation(s) in RCA: 980] [Impact Index Per Article: 89.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 09/03/2013] [Indexed: 12/13/2022]
Abstract
Using the ImmunoChip custom genotyping array, we analyzed 14,498 subjects with multiple sclerosis and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (P < 1.0 × 10(-4)). In a replication phase, we combined these data with previous genome-wide association study (GWAS) data from an independent 14,802 subjects with multiple sclerosis and 26,703 healthy controls. In these 80,094 individuals of European ancestry, we identified 48 new susceptibility variants (P < 5.0 × 10(-8)), 3 of which we found after conditioning on previously identified variants. Thus, there are now 110 established multiple sclerosis risk variants at 103 discrete loci outside of the major histocompatibility complex. With high-resolution Bayesian fine mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalog of multiple sclerosis risk variants and illustrates the value of fine mapping in the resolution of GWAS signals.
Collapse
|
24
|
Abstract
MOTIVATION Genetic variation at classical HLA alleles influences many phenotypes, including susceptibility to autoimmune disease, resistance to pathogens and the risk of adverse drug reactions. However, classical HLA typing methods are often prohibitively expensive for large-scale studies. We previously described a method for imputing classical alleles from linked SNP genotype data. Here, we present a modification of the original algorithm implemented in a freely available software suite that combines local data preparation and QC with probabilistic imputation through a remote server. RESULTS We introduce two modifications to the original algorithm. First, we present a novel SNP selection function that leads to pronounced increases (up by 40% in some scenarios) in call rate. Second, we develop a parallelized model building algorithm that allows us to process a reference set of over 2500 individuals. In a validation experiment, we show that our framework produces highly accurate HLA type imputations at class I and class II loci for independent datasets: at call rates of 95-99%, imputation accuracy is between 92% and 98% at the four-digit level and over 97% at the two-digit level. We demonstrate utility of the method through analysis of a genome-wide association study for psoriasis where there is a known classical HLA risk allele (HLA-C*06:02). We show that the imputed allele shows stronger association with disease than any single SNP within the region. The imputation framework, HLA*IMP, provides a powerful tool for dissecting the architecture of genetic risk within the HLA. AVAILABILITY HLA*IMP, implemented in C++ and Perl, is available from http://oxfordhla.well.ox.ac.uk and is free for academic use.
Collapse
|