1
|
Lai SK, Luo AC, Chiu IH, Chuang HW, Chou TH, Hung TK, Hsu JS, Chen CY, Yang WS, Yang YC, Chen PL. A novel framework for human leukocyte antigen (HLA) genotyping using probe capture-based targeted next-generation sequencing and computational analysis. Comput Struct Biotechnol J 2024; 23:1562-1571. [PMID: 38650588 PMCID: PMC11035020 DOI: 10.1016/j.csbj.2024.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/20/2024] [Accepted: 03/31/2024] [Indexed: 04/25/2024] Open
Abstract
Human leukocyte antigen (HLA) genes play pivotal roles in numerous immunological applications. Given the immense number of polymorphisms, achieving accurate high-throughput HLA typing remains challenging. This study aimed to harness the human pan-genome reference consortium (HPRC) resources as a potential benchmark for HLA reference materials. We meticulously annotated specific four field-resolution alleles for 11 HLA genes (HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, -DRB1, -DRB3, -DRB4 and -DRB5) from 44 high-quality HPRC personal genome assemblies. For sequencing, we crafted HLA-specific probes and conducted capture-based targeted sequencing of the genomic DNA of the HPRC cohort, ensuring focused and comprehensive coverage of the HLA region of interest. We used publicly available short-read whole-genome sequencing (WGS) data from identical samples to offer a comparative perspective. To decipher the vast amount of sequencing data, we employed seven distinct software tools: OptiType, HLA-VBseq, HISAT genotype, SpecHLA, T1K, QzType, and DRAGEN. Each tool offers unique capabilities and algorithms for HLA genotyping, allowing comprehensive analysis and validation of the results. We then compared these results with benchmarks derived from personal genome assemblies. Our findings present a comprehensive four-field-resolution HLA allele annotation for 44 HPRC samples. Significantly, our innovative targeted next-generation sequencing (NGS) approach for HLA genes showed superior accuracy compared with conventional short-read WGS. An integrated analysis involving QzType, T1K, and DRAGEN was developed, achieving 100% accuracy for all 11 HLA genes. In conclusion, our study highlighted the combination of targeted short-read sequencing and astute computational analysis as a robust approach for HLA genotyping. Furthermore, the HPRC cohort has emerged as a valuable assembly-based reference in this realm.
Collapse
Affiliation(s)
- Sheng-Kai Lai
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Allen Chilun Luo
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - I-Hsuan Chiu
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
| | - Hui-Wen Chuang
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Ting-Hsuan Chou
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Tsung-Kai Hung
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Jacob Shujui Hsu
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Chien-Yu Chen
- Department of Biomechatronics Engineering, National Taiwan University, Taipei, Taiwan
| | - Wei-Shiung Yang
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Ya-Chien Yang
- Department of Clinical Laboratory Sciences and Medical Biotechnology, College of Medicine, National Taiwan University, Taipei, Taiwan
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Pei-Lung Chen
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
- Department of Medical Genetics, National Taiwan University Hospital, Taipei, Taiwan
- Graduate Institute of Medical Genomics and Proteomics, College of Medicine, National Taiwan University, Taipei, Taiwan
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| |
Collapse
|
2
|
Dashti M, Malik MZ, Nizam R, Jacob S, Al-Mulla F, Thanaraj TA. Evaluation of HLA typing content of next-generation sequencing datasets from family trios and individuals of arab ethnicity. Front Genet 2024; 15:1407285. [PMID: 38859936 PMCID: PMC11163123 DOI: 10.3389/fgene.2024.1407285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 05/07/2024] [Indexed: 06/12/2024] Open
Abstract
Introduction: HLA typing is a critical tool in both clinical and research applications at the individual and population levels. Benchmarking studies have indicated HLA-HD as the preferred tool for accurate and comprehensive HLA allele calling. The advent of next-generation sequencing (NGS) has revolutionized genetic analysis by providing high-throughput sequencing data. This study aims to evaluate, using the HLA-HD tool, the HLA typing content of whole exome, whole genome, and HLA-targeted panel sequence data from the consanguineous population of Arab ethnicity, which has been underrepresented in prior benchmarking studies. Methods: We utilized sequence data from family trios and individuals, sequenced on one or more of the whole exome, whole genome, and HLA-targeted panel sequencing technologies. The performance and resolution across various HLA genes were evaluated. We incorporated a comparative quality control analysis, assessing the results obtained from HLA-HD by comparing them with those from the HLA-Twin tool to authenticate the accuracy of the findings. Results: Our analysis found that alleles across 29 HLA loci can be successfully and consistently typed from NGS datasets. Clinical-grade whole exome sequencing datasets achieved the highest consistency rate at three-field resolution, followed by targeted HLA panel, research-grade whole exome, and whole genome datasets. Discussion: The study catalogues HLA typing consistency across NGS datasets for a large array of HLA genes and highlights assessments regarding the feasibility of utilizing available NGS datasets in HLA allele studies. These findings underscore the reliability of HLA-HD for HLA typing in underrepresented populations and demonstrate the utility of various NGS technologies in achieving accurate HLA allele calling.
Collapse
Affiliation(s)
| | | | | | | | - Fahd Al-Mulla
- Department of Genetics and Bioinformatics, Dasman Diabetes Institute, Kuwait City, Kuwait
| | | |
Collapse
|
3
|
Krishna C, Tervi A, Saffern M, Wilson EA, Yoo SK, Mars N, Roudko V, Cho BA, Jones SE, Vaninov N, Selvan ME, Gümüş ZH, Lenz TL, Merad M, Boffetta P, Martínez-Jiménez F, Ollila HM, Samstein RM, Chowell D. An immunogenetic basis for lung cancer risk. Science 2024; 383:eadi3808. [PMID: 38386728 DOI: 10.1126/science.adi3808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 01/16/2024] [Indexed: 02/24/2024]
Abstract
Cancer risk is influenced by inherited mutations, DNA replication errors, and environmental factors. However, the influence of genetic variation in immunosurveillance on cancer risk is not well understood. Leveraging population-level data from the UK Biobank and FinnGen, we show that heterozygosity at the human leukocyte antigen (HLA)-II loci is associated with reduced lung cancer risk in smokers. Fine-mapping implicated amino acid heterozygosity in the HLA-II peptide binding groove in reduced lung cancer risk, and single-cell analyses showed that smoking drives enrichment of proinflammatory lung macrophages and HLA-II+ epithelial cells. In lung cancer, widespread loss of HLA-II heterozygosity (LOH) favored loss of alleles with larger neopeptide repertoires. Thus, our findings nominate genetic variation in immunosurveillance as a critical risk factor for lung cancer.
Collapse
Affiliation(s)
- Chirag Krishna
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Anniina Tervi
- Institute for Molecular Medicine, Finland (FIMM), HiLIFE, University of Helsinki, Helsinki 00290, Finland
| | - Miriam Saffern
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eric A Wilson
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Seong-Keun Yoo
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Nina Mars
- Institute for Molecular Medicine, Finland (FIMM), HiLIFE, University of Helsinki, Helsinki 00290, Finland
| | - Vladimir Roudko
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Byuri Angela Cho
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Samuel Edward Jones
- Institute for Molecular Medicine, Finland (FIMM), HiLIFE, University of Helsinki, Helsinki 00290, Finland
| | - Natalie Vaninov
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Myvizhi Esai Selvan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zeynep H Gümüş
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Center for Thoracic Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Tobias L Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, Universität Hamburg, 20146 Hamburg, Germany
| | - Miriam Merad
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Division of Hematology and Medical Oncology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Human Immune Monitoring Center, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paolo Boffetta
- Department of Medical and Surgical Sciences, Alma Mater Studiorum University of Bologna, 40138 Bologna, Italy
- Stony Brook Cancer Center, Stony Brook University, New York, NY 11794, USA
| | - Francisco Martínez-Jiménez
- Vall d'Hebron Institute of Oncology, Barcelona 08035, Spain
- Hartwig Medical Foundation, Amsterdam 1098 XH, the Netherlands
| | - Hanna M Ollila
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Molecular Medicine, Finland (FIMM), HiLIFE, University of Helsinki, Helsinki 00290, Finland
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
- Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Robert M Samstein
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Center for Thoracic Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Radiation Oncology, Mount Sinai Hospital, New York, NY 10029, USA
| | - Diego Chowell
- The Marc and Jennifer Lipschultz Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Immunology and Immunotherapy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
4
|
Marzouka NAD, Alnaqbi H, Al-Aamri A, Tay G, Alsafar H. Investigating the genetic makeup of the major histocompatibility complex (MHC) in the United Arab Emirates population through next-generation sequencing. Sci Rep 2024; 14:3392. [PMID: 38337023 PMCID: PMC10858242 DOI: 10.1038/s41598-024-53986-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 02/07/2024] [Indexed: 02/12/2024] Open
Abstract
The Human leukocyte antigen (HLA) molecules are central to immune response and have associations with the phenotypes of various diseases and induced drug toxicity. Further, the role of HLA molecules in presenting antigens significantly affects the transplantation outcome. The objective of this study was to examine the extent of the diversity of HLA alleles in the population of the United Arab Emirates (UAE) using Next-Generation Sequencing methodologies and encompassing a larger cohort of individuals. A cohort of 570 unrelated healthy citizens of the UAE volunteered to provide samples for Whole Genome Sequencing and Whole Exome Sequencing. The definition of the HLA alleles was achieved through the application of the bioinformatics tools, HLA-LA and xHLA. Subsequently, the findings from this study were compared with other local and international datasets. A broad range of HLA alleles in the UAE population, of which some were previously unreported, was identified. A comparison with other populations confirmed the current population's unique intertwined genetic heritage while highlighting similarities with populations from the Middle East region. Some disease-associated HLA alleles were detected at a frequency of > 5%, such as HLA-B*51:01, HLA-DRB1*03:01, HLA-DRB1*15:01, and HLA-DQB1*02:01. The increase in allele homozygosity, especially for HLA class I genes, was identified in samples with a higher level of genome-wide homozygosity. This highlights a possible effect of consanguinity on the HLA homozygosity. The HLA allele distribution in the UAE population showcases a unique profile, underscoring the need for tailored databases for traditional activities such as unrelated transplant matching and for newer initiatives in precision medicine based on specific populations. This research is part of a concerted effort to improve the knowledge base, particularly in the fields of transplant medicine and investigating disease associations as well as in understanding human migration patterns within the Arabian Peninsula and surrounding regions.
Collapse
Affiliation(s)
- Nour Al Dain Marzouka
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Amira Al-Aamri
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Guan Tay
- Division of Psychiatry, Faculty of Health and Medical Sciences, Medical School, The University of Western Australia, Crawley, WA, Australia
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
- College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
5
|
Zhou Y, Song L, Li H. Full resolution HLA and KIR genes annotation for human genome assemblies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576452. [PMID: 38328160 PMCID: PMC10849470 DOI: 10.1101/2024.01.20.576452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
The HLA (Human Leukocyte Antigen) genes and the KIR (Killer cell Immunoglobulin-like Receptor) genes are critical to immune responses and are associated with many immune-related diseases. Located in highly polymorphic regions, they are hard to be studied with traditional short-read alignment-based methods. Although modern long-read assemblers can often assemble these genes, using existing tools to annotate HLA and KIR genes in these assemblies remains a non-trivial task. Here, we describe Immuannot, a new computation tool to annotate the gene structures of HLA and KIR genes and to type the allele of each gene. Applying Immuannot to 56 regional and 212 whole-genome assemblies from previous studies, we annotated 9,931 HLA and KIR genes and found that almost half of these genes, 4,068, had novel sequences compared to the current Immuno Polymorphism Database (IPD). These novel gene sequences were represented by 2,664 distinct alleles, some of which contained non-synonymous variations resulting in 92 novel protein sequences. We demonstrated the complex haplotype structures at the two loci and reported the linkage between HLA/KIR haplotypes and gene alleles. We anticipate that Immuannot will speed up the discovery of new HLA/KIR alleles and enable the association of HLA/KIR haplotype structures with clinical outcomes in the future.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
| | - Li Song
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH, 03755, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| |
Collapse
|
6
|
Yu D, Ayyala R, Sadek SH, Chittampalli L, Farooq H, Jung J, Nahid AA, Boldirev G, Jung M, Park S, Nguyen A, Zelikovsky A, Mancuso N, Joo JWJ, Thompson RF, Alachkar H, Mangul S. A rigorous benchmarking of alignment-based HLA typing algorithms for RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.22.541750. [PMID: 38293199 PMCID: PMC10827116 DOI: 10.1101/2023.05.22.541750] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.
Collapse
Affiliation(s)
- Dottie Yu
- Department of Quantitative and Computational Biology, Dornsife College of Letters, Arts and Sciences, University of Southern California, 1975 Zonal Ave, Los Angeles, CA 90033, USA
| | - Ram Ayyala
- Department of Quantitative and Computational Biology, Dornsife College of Letters, Arts and Sciences, University of Southern California, 1975 Zonal Ave, Los Angeles, CA 90033, USA
| | - Sarah Hany Sadek
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
- Department of Biology, and Department of Computer Science, California State University, Fullerton, Fullerton, CA 92831
| | - Likhitha Chittampalli
- Department of Computer Science, Viterbi School of Engineering University of Southern California, Los Angeles, CA, USA
| | - Hafsa Farooq
- Department of Computer Science, Georgia State University Atlanta, GA 30303 USA
| | - Junghyun Jung
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Abdullah Al Nahid
- Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh
| | - Grigore Boldirev
- Department of Computer Science, College of Arts and Sciences, Georgia State University, Atlanta, GA, 30303, USA
| | - Mina Jung
- Department of Quantitative and Computational Biology, Dornsife College of Letters, Arts and Sciences, University of Southern California, 1975 Zonal Ave, Los Angeles, CA 90033, US
| | - Sungmin Park
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620, South Korea
| | - Austin Nguyen
- Computational Biologist, Immune Monitoring & Cancer Omics Oregon Health & Science University, Biomedical Engineering, 3181 S.W. Sam Jackson Park Road Portland, OR 97239-3098
| | - Alex Zelikovsky
- Department of Computer Science, College of Arts and Sciences, Georgia State University, Atlanta, GA, 30303, USA
| | - Nicholas Mancuso
- Assistant Professor of Population and Public Health Sciences, Keck School of Medicina, University of Southern California, 1845 N. Soto Street, USA
| | - Jong Wha J Joo
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620, South Korea
- Division of AI Software Convergence, Dongguk University-Seoul, Seoul, 04620, South Korea
| | - Reid F Thompson
- Assistant Professor of Radiation Medicine, School of Medicine, OHSU, Portland, OR 97239
- Assistant Professor of Biomedical Engineering, School of Medicine, OHSU, Portland, OR 97239
- Staff Physician, VA Portland Healthcare System, Portland OR 97239
| | - Houda Alachkar
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, CA, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles
| |
Collapse
|
7
|
Thuesen NH, Klausen MS. Benchmarking NGS-Based HLA Typing Algorithms. Methods Mol Biol 2024; 2809:87-99. [PMID: 38907892 DOI: 10.1007/978-1-0716-3874-3_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
Knowledge of the expected accuracy of HLA typing algorithms is important when choosing between algorithms and when evaluating the HLA typing predictions of an algorithm. This chapter guides the reader through an example benchmarking study that evaluates the performances of four NGS-based HLA typing algorithms as well as outlining factors to consider, when designing and running such a benchmarking study. The code related to this benchmarking workflow can be found at https://github.com/nikolasthuesen/springers-hla-benchmark/ .
Collapse
|
8
|
Song L, Bai G, Liu XS, Li B, Li H. Efficient and accurate KIR and HLA genotyping with massively parallel sequencing data. Genome Res 2023; 33:923-931. [PMID: 37169596 PMCID: PMC10519407 DOI: 10.1101/gr.277585.122] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 05/04/2023] [Indexed: 05/13/2023]
Abstract
Killer cell immunoglobulin like receptor (KIR) genes and human leukocyte antigen (HLA) genes play important roles in innate and adaptive immunity. They are highly polymorphic and cannot be genotyped with standard variant calling pipelines. Compared with HLA genes, many KIR genes are similar to each other in sequences and may be absent in the chromosomes. Therefore, although many tools have been developed to genotype HLA genes using common sequencing data, none of them work for KIR genes. Even specialized KIR genotypers could not resolve all the KIR genes. Here we describe T1K, a novel computational method for the efficient and accurate inference of KIR or HLA alleles from RNA-seq, whole-genome sequencing, or whole-exome sequencing data. T1K jointly considers alleles across all genotyped genes, so it can reliably identify present genes and distinguish homologous genes, including the challenging KIR2DL5A/KIR2DL5B genes. This model also benefits HLA genotyping, where T1K achieves high accuracy in benchmarks. Moreover, T1K can call novel single-nucleotide variants and process single-cell data. Applying T1K to tumor single-cell RNA-seq data, we found that KIR2DL4 expression was enriched in tumor-specific CD8+ T cells. T1K may open the opportunity for HLA and KIR genotyping across various sequencing applications.
Collapse
Affiliation(s)
- Li Song
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Gali Bai
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Bo Li
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA;
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
9
|
Admon A. The biogenesis of the immunopeptidome. Semin Immunol 2023; 67:101766. [PMID: 37141766 DOI: 10.1016/j.smim.2023.101766] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 05/06/2023]
Abstract
The immunopeptidome is the repertoire of peptides bound and presented by the MHC class I, class II, and non-classical molecules. The peptides are produced by the degradation of most cellular proteins, and in some cases, peptides are produced from extracellular proteins taken up by the cells. This review attempts to first describe some of its known and well-accepted concepts, and next, raise some questions about a few of the established dogmas in this field: The production of novel peptides by splicing is questioned, suggesting here that spliced peptides are extremely rare, if existent at all. The degree of the contribution to the immunopeptidome by degradation of cellular protein by the proteasome is doubted, therefore this review attempts to explain why it is likely that this contribution to the immunopeptidome is possibly overstated. The contribution of defective ribosome products (DRiPs) and non-canonical peptides to the immunopeptidome is noted and methods are suggested to quantify them. In addition, the common misconception that the MHC class II peptidome is mostly derived from extracellular proteins is noted, and corrected. It is stressed that the confirmation of sequence assignments of non-canonical and spliced peptides should rely on targeted mass spectrometry using spiking-in of heavy isotope-labeled peptides. Finally, the new methodologies and modern instrumentation currently available for high throughput kinetics and quantitative immunopeptidomics are described. These advanced methods open up new possibilities for utilizing the big data generated and taking a fresh look at the established dogmas and reevaluating them critically.
Collapse
Affiliation(s)
- Arie Admon
- Faculty of Biology, Technion-Israel Institute of Technology, Israel.
| |
Collapse
|
10
|
Thuesen NH, Klausen MS, Gopalakrishnan S, Trolle T, Renaud G. Benchmarking freely available HLA typing algorithms across varying genes, coverages and typing resolutions. Front Immunol 2022; 13:987655. [PMID: 36426357 PMCID: PMC9679531 DOI: 10.3389/fimmu.2022.987655] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 10/10/2022] [Indexed: 11/02/2023] Open
Abstract
Identifying the specific human leukocyte antigen (HLA) allele combination of an individual is crucial in organ donation, risk assessment of autoimmune and infectious diseases and cancer immunotherapy. However, due to the high genetic polymorphism in this region, HLA typing requires specialized methods. We investigated the performance of five next-generation sequencing (NGS) based HLA typing tools with a non-restricted license namely HLA*LA, Optitype, HISAT-genotype, Kourami and STC-Seq. This evaluation was done for the five HLA loci, HLA-A, -B, -C, -DRB1 and -DQB1 using whole-exome sequencing (WES) samples from 829 individuals. The robustness of the tools to lower depth of coverage (DOC) was evaluated by subsampling and HLA typing 230 WES samples at DOC ranging from 1X to 100X. The HLA typing accuracy was measured across four typing resolutions. Among these, we present two clinically-relevant typing resolutions (P group and pseudo-sequence), which specifically focus on the peptide binding region. On average, across the five HLA loci examined, HLA*LA was found to have the highest typing accuracy. For the individual loci, HLA-A, -B and -C, Optitype's typing accuracy was the highest and HLA*LA had the highest typing accuracy for HLA-DRB1 and -DQB1. The tools' robustness to lower DOC data varied widely and further depended on the specific HLA locus. For all Class I loci, Optitype had a typing accuracy above 95% (according to the modification of the amino acids in the functionally relevant portion of the HLA molecule) at 50X, but increasing the DOC beyond even 100X could still improve the typing accuracy of HISAT-genotype, Kourami, and STC-seq across all five HLA loci as well as HLA*LA's typing accuracy for HLA-DQB1. HLA typing is also used in studies of ancient DNA (aDNA), which is often based on sequencing data with lower quality and DOC. Interestingly, we found that Optitype's typing accuracy is not notably impaired by short read length or by DNA damage, which is typical of aDNA, as long as the DOC is sufficiently high.
Collapse
Affiliation(s)
- Nikolas Hallberg Thuesen
- Evaxion Biotech, Copenhagen, Denmark
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | | | - Shyam Gopalakrishnan
- Section for Hologenomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Gabriel Renaud
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|