1
|
Servati M, Vaccaro CN, Diller EE, Pellegrino Da Silva R, Mafra F, Cao S, Stanley KB, Cohen-Gadol AA, Parker JG. Metabolic Insight into Glioma Heterogeneity: Mapping Whole Exome Sequencing to In Vivo Imaging with Stereotactic Localization and Deep Learning. Metabolites 2024; 14:337. [PMID: 38921472 PMCID: PMC11205750 DOI: 10.3390/metabo14060337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/07/2024] [Accepted: 06/12/2024] [Indexed: 06/27/2024] Open
Abstract
Intratumoral heterogeneity (ITH) complicates the diagnosis and treatment of glioma, partly due to the diverse metabolic profiles driven by underlying genomic alterations. While multiparametric imaging enhances the characterization of ITH by capturing both spatial and functional variations, it falls short in directly assessing the metabolic activities that underpin these phenotypic differences. This gap stems from the challenge of integrating easily accessible, colocated pathology and detailed genomic data with metabolic insights. This study presents a multifaceted approach combining stereotactic biopsy with standard clinical open-craniotomy for sample collection, voxel-wise analysis of MR images, regression-based GAM, and whole-exome sequencing. This work aims to demonstrate the potential of machine learning algorithms to predict variations in cellular and molecular tumor characteristics. This retrospective study enrolled ten treatment-naïve patients with radiologically confirmed glioma. Each patient underwent a multiparametric MR scan (T1W, T1W-CE, T2W, T2W-FLAIR, DWI) prior to surgery. During standard craniotomy, at least 1 stereotactic biopsy was collected from each patient, with screenshots of the sample locations saved for spatial registration to pre-surgical MR data. Whole-exome sequencing was performed on flash-frozen tumor samples, prioritizing the signatures of five glioma-related genes: IDH1, TP53, EGFR, PIK3CA, and NF1. Regression was implemented with a GAM using a univariate shape function for each predictor. Standard receiver operating characteristic (ROC) analyses were used to evaluate detection, with AUC (area under curve) calculated for each gene target and MR contrast combination. Mean AUC for five gene targets and 31 MR contrast combinations was 0.75 ± 0.11; individual AUCs were as high as 0.96 for both IDH1 and TP53 with T2W-FLAIR and ADC, and 0.99 for EGFR with T2W and ADC. These results suggest the possibility of predicting exome-wide mutation events from noninvasive, in vivo imaging by combining stereotactic localization of glioma samples and a semi-parametric deep learning method. The genomic alterations identified, particularly in IDH1, TP53, EGFR, PIK3CA, and NF1, are known to play pivotal roles in metabolic pathways driving glioma heterogeneity. Our methodology, therefore, indirectly sheds light on the metabolic landscape of glioma through the lens of these critical genomic markers, suggesting a complex interplay between tumor genomics and metabolism. This approach holds potential for refining targeted therapy by better addressing the genomic heterogeneity of glioma tumors.
Collapse
Affiliation(s)
- Mahsa Servati
- Radiology and Imaging Sciences, School of Medicine, Indiana University, 950 W. Walnut St., R2 E107, Indianapolis, IN 46202, USA (J.G.P.)
- School of Health Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Courtney N. Vaccaro
- Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Emily E. Diller
- Feinberg School of Medicine, Northwestern Medicine, Chicago, IL 60611, USA
| | | | | | - Sha Cao
- Radiology and Imaging Sciences, School of Medicine, Indiana University, 950 W. Walnut St., R2 E107, Indianapolis, IN 46202, USA (J.G.P.)
| | - Katherine B. Stanley
- Radiology and Imaging Sciences, School of Medicine, Indiana University, 950 W. Walnut St., R2 E107, Indianapolis, IN 46202, USA (J.G.P.)
| | - Aaron A. Cohen-Gadol
- Radiology and Imaging Sciences, School of Medicine, Indiana University, 950 W. Walnut St., R2 E107, Indianapolis, IN 46202, USA (J.G.P.)
| | - Jason G. Parker
- Radiology and Imaging Sciences, School of Medicine, Indiana University, 950 W. Walnut St., R2 E107, Indianapolis, IN 46202, USA (J.G.P.)
- School of Health Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
2
|
Aisagbonhi O, Ghlichloo I, Hong DS, Roma A, Fadare O, Eskander R, Saenz C, Fisch KM, Song W. Comprehensive next-generation sequencing identifies novel putative pathogenic or likely pathogenic germline variants in patients with concurrent tubo-ovarian and endometrial serous and endometrioid carcinomas or precursors. Gynecol Oncol 2024; 187:241-248. [PMID: 38833993 DOI: 10.1016/j.ygyno.2024.05.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 05/22/2024] [Accepted: 05/23/2024] [Indexed: 06/06/2024]
Abstract
BACKGROUND Endometrial serous carcinoma (ESC) and tubo-ovarian high-grade serous carcinoma (HGSC) are characterized by late-stage presentation and high mortality. Current guidelines for prevention recommend risk-reducing salpingo-oophorectomy (RRSO) in patients with hereditary mutations in cancer susceptibility genes. However, HGSC displays extensive genetic heterogeneity with alterations in 168 genes identified in TCGA study, but current germline testing panels are often limited to the handful of recurrently mutated genes, leaving families with rare hereditary gene mutations potentially at-risk. OBJECTIVE To determine if there are rare germline mutations that may aid in early identification of more patients at-risk for ESC and/or HGSC by evaluating patients with concurrent ESC, HGSC or precursor lesions, and endometrial atypical hyperplasia (CAH) or low-grade endometrial endometrioid adenocarcinoma (LGEEA). METHODS We performed targeted next-generation sequencing using TSO 500, a 523 gene panel, on formalin-fixed paraffin-embedded tumor and matched benign non-tumor tissue blocks from 5 patients with concurrent ESC, HGSC or precursor lesions, and CAH or LGEEA. RESULTS We identified germline pathogenic, likely pathogenic or uncertain significance variants in cancer susceptibility genes in 4 of 5 patients - affected genes included GLI1, PIK3R1, FOXP1, FANCD2, INPP4B and H3F3C. Notably, none of these genes were included in the commercially available germline testing panels initially used to evaluate the patients at the time of their diagnoses. CONCLUSION Comprehensive germline testing of patients with concurrent LGEEA or CAH and ESC, HGSC or precursor lesions may aid in early identification of relatives at-risk for cancer who may be candidates for RRSO with hysterectomy.
Collapse
Affiliation(s)
- Omonigho Aisagbonhi
- Department of Pathology, University of California San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California San Diego, La Jolla, CA, USA.
| | - Ida Ghlichloo
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| | - Duncan S Hong
- Moores Cancer Center, University of California San Diego, La Jolla, CA, USA; Division of Blood and Marrow Transplantation, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Andres Roma
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| | - Oluwole Fadare
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| | - Ramez Eskander
- Moores Cancer Center, University of California San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
| | - Cheryl Saenz
- Moores Cancer Center, University of California San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kathleen M Fisch
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA; Center for Computational Biology and Bioinformatics, University of California, San Diego, La Jolla, CA, USA
| | - Wei Song
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
3
|
Kalleberg J, Rissman J, Schnabel RD. Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589602. [PMID: 38659907 PMCID: PMC11042298 DOI: 10.1101/2024.04.15.589602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Variant calling across diverse species remains challenging as most bioinformatics tools default to assumptions based on human genomes. DeepVariant (DV) excels without joint genotyping while offering fewer implementation barriers. However, the growing appeal of a "universal" algorithm has magnified the unknown impacts when used with non-human genomes. Here, we use bovine genomes to assess the limits of human-genome-trained models in other species. We introduce the first multi-species DV model that achieves a lower Mendelian Inheritance Error (MIE) rate during single-sample genotyping. Our novel approach, TrioTrain, automates extending DV for species without Genome In A Bottle (GIAB) resources and uses region shuffling to mitigate barriers for SLURM-based clusters. To offset imperfect truth labels for animal genomes, we remove Mendelian discordant variants before training, where models are tuned to genotype the offspring correctly. With TrioTrain, we use cattle, yak, and bison trios to build 30 model iterations across five phases. We observe remarkable performance across phases when testing the GIAB human trios with a mean SNP F1 score >0.990. In HG002, our phase 4 bovine model identifies more variants at a lower MIE rate than DeepTrio. In bovine F1-hybrid genomes, our model substantially reduces inheritance errors with a mean MIE rate of 0.03 percent. Although constrained by imperfect labels, we find that multi-species, trio-based training produces a robust variant calling model. Our research demonstrates that exclusively training with human genomes restricts the application of deep-learning approaches for comparative genomics.
Collapse
Affiliation(s)
- Jenna Kalleberg
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Jacob Rissman
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Robert D Schnabel
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
- University of Missouri, Genetics Area Program, Columbia, MO, 65201 USA
| |
Collapse
|
4
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
5
|
Ergun MA, Cinal O, Bakışlı B, Emül AA, Baysan M. COSAP: Comparative Sequencing Analysis Platform. BMC Bioinformatics 2024; 25:130. [PMID: 38532317 DOI: 10.1186/s12859-024-05756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/20/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Collapse
Affiliation(s)
- Mehmet Arif Ergun
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Omer Cinal
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Berkant Bakışlı
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey.
| |
Collapse
|
6
|
Fasaludeen A, McTague A, Jose M, Banerjee M, Sundaram S, Madhusoodanan UK, Radhakrishnan A, Menon RN. Genetic variant interpretation for the neurologist - A pragmatic approach in the next-generation sequencing era in childhood epilepsy. Epilepsy Res 2024; 201:107341. [PMID: 38447235 DOI: 10.1016/j.eplepsyres.2024.107341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 02/14/2024] [Accepted: 02/29/2024] [Indexed: 03/08/2024]
Abstract
Genetic advances over the past decade have enhanced our understanding of the genetic landscape of childhood epilepsy. However a major challenge for clinicians ha been understanding the rationale and systematic approach towards interpretation of the clinical significance of variant(s) detected in their patients. As the clinical paradigm evolves from gene panels to whole exome or whole genome testing including rapid genome sequencing, the number of patients tested and variants identified per patient will only increase. Each step in the process of variant interpretation has limitations and there is no single criterion which enables the clinician to draw reliable conclusions on a causal relationship between the variant and disease without robust clinical phenotyping. Although many automated online analysis software tools are available, these carry a risk of misinterpretation. This guideline provides a pragmatic, real-world approach to variant interpretation for the child neurologist. The focus will be on ascertaining aspects such as variant frequency, subtype, inheritance pattern, structural and functional consequence with regard to genotype-phenotype correlations, while refraining from mere interpretation of the classification provided in a genetic test report. It will not replace the expert advice of colleagues in clinical genetics, however as genomic investigations become a first-line test for epilepsy, it is vital that neurologists and epileptologists are equipped to navigate this landscape.
Collapse
Affiliation(s)
- Alfiya Fasaludeen
- Dept of Neurology, Sree Chitra Tirunal Institute for Medical Sciences & Technology (SCTIMST), Thiruvananthapuram, Kerala, India
| | - Amy McTague
- Developmental Neurosciences, UCL Great Ormond Street Institute of Child Health, London, United Kingdom; Department of Neurology, Great Ormond Street Hospital, London, United Kingdom
| | - Manna Jose
- Dept of Neurology, Sree Chitra Tirunal Institute for Medical Sciences & Technology (SCTIMST), Thiruvananthapuram, Kerala, India
| | - Moinak Banerjee
- Human Molecular Genetics Laboratory, Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India
| | - Soumya Sundaram
- Dept of Neurology, Sree Chitra Tirunal Institute for Medical Sciences & Technology (SCTIMST), Thiruvananthapuram, Kerala, India
| | - U K Madhusoodanan
- Department of Biochemistry, Sree Chitra Tirunal Institute for Medical Sciences & Technology (SCTIMST), Thiruvananthapuram, Kerala, India
| | - Ashalatha Radhakrishnan
- Dept of Neurology, Sree Chitra Tirunal Institute for Medical Sciences & Technology (SCTIMST), Thiruvananthapuram, Kerala, India
| | - Ramshekhar N Menon
- Dept of Neurology, Sree Chitra Tirunal Institute for Medical Sciences & Technology (SCTIMST), Thiruvananthapuram, Kerala, India.
| |
Collapse
|
7
|
Kvapilova K, Misenko P, Radvanszky J, Brzon O, Budis J, Gazdarica J, Pos O, Korabecna M, Kasny M, Szemes T, Kvapil P, Paces J, Kozmik Z. Validated WGS and WES protocols proved saliva-derived gDNA as an equivalent to blood-derived gDNA for clinical and population genomic analyses. BMC Genomics 2024; 25:187. [PMID: 38365587 PMCID: PMC10873937 DOI: 10.1186/s12864-024-10080-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 02/02/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND Whole exome sequencing (WES) and whole genome sequencing (WGS) have become standard methods in human clinical diagnostics as well as in population genomics (POPGEN). Blood-derived genomic DNA (gDNA) is routinely used in the clinical environment. Conversely, many POPGEN studies and commercial tests benefit from easy saliva sampling. Here, we evaluated the quality of variant call sets and the level of genotype concordance of single nucleotide variants (SNVs) and small insertions and deletions (indels) for WES and WGS using paired blood- and saliva-derived gDNA isolates employing genomic reference-based validated protocols. METHODS The genomic reference standard Coriell NA12878 was repeatedly analyzed using optimized WES and WGS protocols, and data calls were compared with the truth dataset published by the Genome in a Bottle Consortium. gDNA was extracted from the paired blood and saliva samples of 10 participants and processed using the same protocols. A comparison of paired blood-saliva call sets was performed in the context of WGS and WES genomic reference-based technical validation results. RESULTS The quality pattern of called variants obtained from genomic-reference-based technical replicates correlates with data calls of paired blood-saliva-derived samples in all levels of tested examinations despite a higher rate of non-human contamination found in the saliva samples. The F1 score of 10 blood-to-saliva-derived comparisons ranged between 0.8030-0.9998 for SNVs and between 0.8883-0.9991 for small-indels in the case of the WGS protocol, and between 0.8643-0.999 for SNVs and between 0.7781-1.000 for small-indels in the case of the WES protocol. CONCLUSION Saliva may be considered an equivalent material to blood for genetic analysis for both WGS and WES under strict protocol conditions. The accuracy of sequencing metrics and variant-detection accuracy is not affected by choosing saliva as the gDNA source instead of blood but much more significantly by the genomic context, variant types, and the sequencing technology used.
Collapse
Affiliation(s)
- Katerina Kvapilova
- Faculty of Science, Charles University, Albertov 6, Prague, 128 00, Czech Republic.
- Institute of Applied Biotechnologies a.s, Služeb 4, Prague, 108 00, Czech Republic.
| | - Pavol Misenko
- Geneton s.r.o, Ilkovičova 8, Bratislava, 841 04, Slovakia
| | - Jan Radvanszky
- Geneton s.r.o, Ilkovičova 8, Bratislava, 841 04, Slovakia
- Institute of Clinical and Translational Research, Biomedical Research Center of the Slovak Academy of Sciences, Dúbravská Cesta 9, Bratislava, 845 05, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovičova 3278/6, Karlova Ves, Bratislava, 841 04, Slovakia
- Comenius University Science Park, Comenius University, Ilkovičova 8, Karlova Ves, Bratislava, 841 04, Slovakia
| | - Ondrej Brzon
- Institute of Applied Biotechnologies a.s, Služeb 4, Prague, 108 00, Czech Republic
| | - Jaroslav Budis
- Geneton s.r.o, Ilkovičova 8, Bratislava, 841 04, Slovakia
- Comenius University Science Park, Comenius University, Ilkovičova 8, Karlova Ves, Bratislava, 841 04, Slovakia
- Slovak Centre for Scientific and Technical Information, Staré Mesto, Lamačská Cesta 8A, Bratislava, 811 04, Slovakia
| | - Juraj Gazdarica
- Geneton s.r.o, Ilkovičova 8, Bratislava, 841 04, Slovakia
- Comenius University Science Park, Comenius University, Ilkovičova 8, Karlova Ves, Bratislava, 841 04, Slovakia
- Slovak Centre for Scientific and Technical Information, Staré Mesto, Lamačská Cesta 8A, Bratislava, 811 04, Slovakia
| | - Ondrej Pos
- Geneton s.r.o, Ilkovičova 8, Bratislava, 841 04, Slovakia
- Comenius University Science Park, Comenius University, Ilkovičova 8, Karlova Ves, Bratislava, 841 04, Slovakia
| | - Marie Korabecna
- Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University and General University Hospital in Prague, Albertov 4, Prague, 128 00, Czech Republic
| | - Martin Kasny
- Institute of Applied Biotechnologies a.s, Služeb 4, Prague, 108 00, Czech Republic
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, Brno, 611 37, Czech Republic
| | - Tomas Szemes
- Geneton s.r.o, Ilkovičova 8, Bratislava, 841 04, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovičova 3278/6, Karlova Ves, Bratislava, 841 04, Slovakia
- Comenius University Science Park, Comenius University, Ilkovičova 8, Karlova Ves, Bratislava, 841 04, Slovakia
| | - Petr Kvapil
- Institute of Applied Biotechnologies a.s, Služeb 4, Prague, 108 00, Czech Republic
| | - Jan Paces
- Laboratory of Genomics and Bioinformatics, Institute of Molecular Genetics of the Czech Academy of Sciences, Vídeňská 1083, Prague, 142 20, Czech Republic
| | - Zbynek Kozmik
- Laboratory of Transcriptional Regulation, Institute of Molecular Genetics of the Czech Academy of Sciences, Vídeňská 1083, Prague, 142 20, Czech Republic
| |
Collapse
|
8
|
Schobers G, Derks R, den Ouden A, Swinkels H, van Reeuwijk J, Bosgoed E, Lugtenberg D, Sun SM, Corominas Galbany J, Weiss M, Blok MJ, Olde Keizer RACM, Hofste T, Hellebrekers D, de Leeuw N, Stegmann A, Kamsteeg EJ, Paulussen ADC, Ligtenberg MJL, Bradley XZ, Peden J, Gutierrez A, Pullen A, Payne T, Gilissen C, van den Wijngaard A, Brunner HG, Nelen M, Yntema HG, Vissers LELM. Genome sequencing as a generic diagnostic strategy for rare disease. Genome Med 2024; 16:32. [PMID: 38355605 PMCID: PMC10868087 DOI: 10.1186/s13073-024-01301-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 02/02/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND To diagnose the full spectrum of hereditary and congenital diseases, genetic laboratories use many different workflows, ranging from karyotyping to exome sequencing. A single generic high-throughput workflow would greatly increase efficiency. We assessed whether genome sequencing (GS) can replace these existing workflows aimed at germline genetic diagnosis for rare disease. METHODS We performed short-read GS (NovaSeq™6000; 150 bp paired-end reads, 37 × mean coverage) on 1000 cases with 1271 known clinically relevant variants, identified across different workflows, representative of our tertiary diagnostic centers. Variants were categorized into small variants (single nucleotide variants and indels < 50 bp), large variants (copy number variants and short tandem repeats) and other variants (structural variants and aneuploidies). Variant calling format files were queried per variant, from which workflow-specific true positive rates (TPRs) for detection were determined. A TPR of ≥ 98% was considered the threshold for transition to GS. A GS-first scenario was generated for our laboratory, using diagnostic efficacy and predicted false negative as primary outcome measures. As input, we modeled the diagnostic path for all 24,570 individuals referred in 2022, combining the clinical referral, the transition of the underlying workflow(s) to GS, and the variant type(s) to be detected. RESULTS Overall, 95% (1206/1271) of variants were detected. Detection rates differed per variant category: small variants in 96% (826/860), large variants in 93% (341/366), and other variants in 87% (39/45). TPRs varied between workflows (79-100%), with 7/10 being replaceable by GS. Models for our laboratory indicate that a GS-first strategy would be feasible for 84.9% of clinical referrals (750/883), translating to 71% of all individuals (17,444/24,570) receiving GS as their primary test. An estimated false negative rate of 0.3% could be expected. CONCLUSIONS GS can capture clinically relevant germline variants in a 'GS-first strategy' for the majority of clinical indications in a genetics diagnostic lab.
Collapse
Affiliation(s)
- Gaby Schobers
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Amber den Ouden
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Hilde Swinkels
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Jeroen van Reeuwijk
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | - Ermanno Bosgoed
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | | | - Su Ming Sun
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, Netherlands
| | - Jordi Corominas Galbany
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | - Marjan Weiss
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Marinus J Blok
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, Netherlands
| | - Richelle A C M Olde Keizer
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | - Tom Hofste
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Debby Hellebrekers
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, Netherlands
| | - Nicole de Leeuw
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Alexander Stegmann
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, Netherlands
| | | | - Aimee D C Paulussen
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, Netherlands
| | - Marjolijn J L Ligtenberg
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | | | | | | | | | | | - Christian Gilissen
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | | | - Han G Brunner
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, Netherlands
| | - Marcel Nelen
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
| | - Helger G Yntema
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboudumc, Nijmegen, Netherlands.
- Research Institute for Medical Innovation, Radboudumc, Nijmegen, Netherlands.
| |
Collapse
|
9
|
Petrin AL, Machado-Paula L, Hinkle A, Hovey L, Awotoye W, Chimenti M, Darbro B, Ribeiro-Bicudo LA, Dabdoub SM, Peter T, Murray J, Van Otterloo E, Rengasamy Venugopalan S, Moreno-Uribe LM. Whole genome sequencing of a family with autosomal dominant features within the oculoauriculovertebral spectrum. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.07.24301824. [PMID: 38370836 PMCID: PMC10871465 DOI: 10.1101/2024.02.07.24301824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Oculoauriculovertebral Spectrum (OAVS) encompasses a wide variety of anomalies on derivatives from the first and second pharyngeal arches including macrostomia, hemifacial microsomia, micrognathia, preauricular tags, ocular and vertebral anomalies. We present the genetic findings of a large three-generation family with multiple members affected with macrostomia, preauricular tags and uni- or bilateral ptosis following an autosomal dominant segregation pattern. Methods We generated whole genome sequencing data for the proband, affected parent and unaffected paternal grandparent followed by Sanger sequencing on 23 family members for the top 10 candidate genes: KCND2, PDGFRA, CASP9, NCOA3, WNT10A, SIX1, MTF1, KDR/VEGFR2, LRRK1, and TRIM2. We performed parent and sibling-based transmission disequilibrium tests and burden analysis to explore segregation and burden of candidate gene mutations. Bioinformatic analyses investigated the biological connection between genes and the abnormal phenotypes. Results Overall, rare missense mutations in SIX1, KDR/VEGFR2, and PDGFRA showed the best evidence of segregation with the OAV phenotypes in this family. When considering affection with any of the 3 OAVS phenotypes as an outcome, parent-TDTs and sib-TDTs (unadjusted p-values) found that SIX1 (p=0.025, p=0.052), followed by PDGFRA (p=0.180, p=0.069) and KDR/VEGFR2 (p=0.180, p=0.069) have the strongest associations in this family. Burden analysis via a penalized linear mixed model identified SIX1 (RC=0.87) and PDGFRA (RC=0.98) as having the strongest association with OAVS severity. Using phenotype-specific ogfrautcomes, sib-TDTs identified associations between (1) SIX1 with uni- or bilateral ptosis (p=0.049) and ear tags (p=0.01), (2) PDGFRA and KDR/VEGFR2 with ear tags (both p<0.01). Conclusion Our study reports the genomic findings of a large family with multiple individuals affected with OAVS phenotypes with autosomal dominant inheritance. Our findings narrow down to three potential candidate genes, SIX1, PDGFRA, and KDR/VEGFR2. Among these, SIX1 has been previously associated with OAVS ear malformations and it is co-expressed with EYA1 during ear development. Attempts to strengthen the genotype-phenotype co-relation underlying the OAVS of phenotypes are essential to discover the etiological factors leading to this complex and burdensome condition as well as for family counseling and prevention efforts.
Collapse
Affiliation(s)
- A L Petrin
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - Lam Machado-Paula
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - A Hinkle
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - L Hovey
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - W Awotoye
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - M Chimenti
- Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - B Darbro
- Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | | | - S M Dabdoub
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - T Peter
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | - J Murray
- Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - E Van Otterloo
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| | | | - L M Moreno-Uribe
- College of Dentistry and Dental Clinics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
10
|
Brancato V, Esposito G, Coppola L, Cavaliere C, Mirabelli P, Scapicchio C, Borgheresi R, Neri E, Salvatore M, Aiello M. Standardizing digital biobanks: integrating imaging, genomic, and clinical data for precision medicine. J Transl Med 2024; 22:136. [PMID: 38317237 PMCID: PMC10845786 DOI: 10.1186/s12967-024-04891-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/14/2024] [Indexed: 02/07/2024] Open
Abstract
Advancements in data acquisition and computational methods are generating a large amount of heterogeneous biomedical data from diagnostic domains such as clinical imaging, pathology, and next-generation sequencing (NGS), which help characterize individual differences in patients. However, this information needs to be available and suitable to promote and support scientific research and technological development, supporting the effective adoption of the precision medicine approach in clinical practice. Digital biobanks can catalyze this process, facilitating the sharing of curated and standardized imaging data, clinical, pathological and molecular data, crucial to enable the development of a comprehensive and personalized data-driven diagnostic approach in disease management and fostering the development of computational predictive models. This work aims to frame this perspective, first by evaluating the state of standardization of individual diagnostic domains and then by identifying challenges and proposing a possible solution towards an integrative approach that can guarantee the suitability of information that can be shared through a digital biobank. Our analysis of the state of the art shows the presence and use of reference standards in biobanks and, generally, digital repositories for each specific domain. Despite this, standardization to guarantee the integration and reproducibility of the numerical descriptors generated by each domain, e.g. radiomic, pathomic and -omic features, is still an open challenge. Based on specific use cases and scenarios, an integration model, based on the JSON format, is proposed that can help address this problem. Ultimately, this work shows how, with specific standardization and promotion efforts, the digital biobank model can become an enabling technology for the comprehensive study of diseases and the effective development of data-driven technologies at the service of precision medicine.
Collapse
Affiliation(s)
| | - Giuseppina Esposito
- Bio Check Up S.R.L, 80121, Naples, Italy
- Department of Advanced Biomedical Sciences, University of Naples Federico II, 80131, Naples, Italy
| | | | | | - Peppino Mirabelli
- UOS Laboratori di Ricerca e Biobanca, AORN Santobono-Pausilipon, Via Teresa Ravaschieri, 8, 80122, Naples, Italy
| | - Camilla Scapicchio
- Academic Radiology, Department of Translational Research, University of Pisa, via Roma, 67, 56126, Pisa, Italy
| | - Rita Borgheresi
- Academic Radiology, Department of Translational Research, University of Pisa, via Roma, 67, 56126, Pisa, Italy
| | - Emanuele Neri
- Academic Radiology, Department of Translational Research, University of Pisa, via Roma, 67, 56126, Pisa, Italy
| | | | | |
Collapse
|
11
|
Charron P, Kang M. VariantDetective: an accurate all-in-one pipeline for detecting consensus bacterial SNPs and SVs. Bioinformatics 2024; 40:btae066. [PMID: 38366603 PMCID: PMC10898327 DOI: 10.1093/bioinformatics/btae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 01/16/2024] [Accepted: 02/14/2024] [Indexed: 02/18/2024] Open
Abstract
MOTIVATION Genomic variations comprise a spectrum of alterations, ranging from single nucleotide polymorphisms (SNPs) to large-scale structural variants (SVs), which play crucial roles in bacterial evolution and species diversification. Accurately identifying SNPs and SVs is beneficial for subsequent evolutionary and epidemiological studies. This study presents VariantDetective (VD), a novel, user-friendly, and all-in-one pipeline combining SNP and SV calling to generate consensus genomic variants using multiple tools. RESULTS The VD pipeline accepts various file types as input to initiate SNP and/or SV calling, and benchmarking results demonstrate VD's robustness and high accuracy across multiple tested datasets when compared to existing variant calling approaches. AVAILABILITY AND IMPLEMENTATION The source code, test data, and relevant information for VD are freely accessible at https://github.com/OLF-Bioinformatics/VariantDetective under the MIT License.
Collapse
Affiliation(s)
- Philippe Charron
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| | - Mingsong Kang
- Ottawa Laboratory-Fallowfield, Canadian Food Inspection Agency, 3851 Fallowfield Road, Nepean, Ontario K2J 4S1, Canada
| |
Collapse
|
12
|
Ha YJ, Kang S, Kim J, Kim J, Jo SY, Kim S. Comprehensive benchmarking and guidelines of mosaic variant calling strategies. Nat Methods 2023; 20:2058-2067. [PMID: 37828153 PMCID: PMC10703685 DOI: 10.1038/s41592-023-02043-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 09/12/2023] [Indexed: 10/14/2023]
Abstract
Rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants represented as heterozygous, homozygous and mosaic mutations. However, the best practices for mosaic variant calling remain disorganized owing to the technical and conceptual difficulties faced in evaluation. Here we present our benchmark of 11 feasible mosaic variant detection approaches based on a systematically designed whole-exome-level reference standard that mimics mosaic samples, supported by 354,258 control positive mosaic single-nucleotide variants and insertion-deletion mutations and 33,111,725 control negatives. We identified not only the best practice for mosaic variant detection but also the condition-dependent strengths and weaknesses of the current methods. Furthermore, feature-level evaluation and their combinatorial usage across multiple algorithms direct the way for immediate to prolonged improvements in mosaic variant detection. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers.
Collapse
Affiliation(s)
- Yoo-Jin Ha
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Seungseok Kang
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jisoo Kim
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Junhan Kim
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Se-Young Jo
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sangwoo Kim
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea.
- POSTECH Biotechnology Center, Pohang University of Science and Technology, Pohang, Republic of Korea.
| |
Collapse
|
13
|
Park H, Gim J. A comparative investigation of single nucleotide variant calling for a personal non-Caucasian sequencing sample. Genes Genomics 2023; 45:1527-1536. [PMID: 37651066 DOI: 10.1007/s13258-023-01439-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023]
Abstract
BACKGROUND Dropping cost and increasing clinical application of whole genome sequencing (WGS) lead a necessity of efficient (accurate and rapid) variant calling procedures from a personal WGS data (n = 1). A number of variant calling pipelines have been introduced utilizing the human genome reference GRCh38 as a reference and a benchmark dataset called 'NA12878', which are both 'standard' but limited ethnic origin. Considering the nature of variant calling algorithms and recent updates in sequencing protocol, however, it is necessary to revisit the efficiency of the current best pipelines for a personal WGS data from diverse ethnicity. OBJECTIVE We discuss the most efficient practices for variant calling of a personal WGS reads, with a particular emphasis on whether (1) ethnic match or mismatch between the reference genome and a WGS data produces a distinct result and more importantly (2) there is an ethnic-specific optimal workflow. METHODS Here, we generate an appropriate WGS data, DNA array, and sufficient number of Sanger validated variants from a single Korean subject to perform such a comprehensive comparison. We applied this WGS reads and the 'NA12878' reads to 8 different variant calling pipelines with 2 different reference genomes (GRCh38 and KOREF, a Korean reference genome) to which the WGS reads from different ethnic origins are aligned. RESULTS We evaluated the performance of the pipelines with the matched array genotype data and Sanger sequencing validation and demonstrated that: regardless to the ethnic match/mismatch (1) Novoalign-GATK4 showed the most efficient performance with the exceptional calls in MHC region; (2) the overall performance was better with GRCh38, while a significant difference in recall was observed. In addition, we found it is largely reduced computing cost maintaining performance to remove 'markduplication' step with PCR-free WGS data. CONCLUSION For variant calling of a personal PCR-free WGS data, regardless of ethnicity consideration, we recommend the use of the Novoalign + GATK4 with GRCh38 and without 'markduplication'.
Collapse
Affiliation(s)
- HyeonSeul Park
- BK21 FOUR, Department of Integrative Biological Sciences, Chosun University, Gwangju, Republic of Korea
| | - JungSoo Gim
- BK21 FOUR, Department of Integrative Biological Sciences, Chosun University, Gwangju, Republic of Korea.
- Department of Biomedical Science, Chosun University, Gwangju, Republic of Korea.
- Asian Dementia Research Initiative, Chosun University, Gwangju, Republic of Korea.
| |
Collapse
|
14
|
Xiang X, Lu B, Song D, Li J, Shu K, Pu D. Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data. Sci Rep 2023; 13:20444. [PMID: 37993475 PMCID: PMC10665316 DOI: 10.1038/s41598-023-47135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open
Abstract
Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
Collapse
Affiliation(s)
- Xudong Xiang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Bowen Lu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Dongyang Song
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Jie Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Dan Pu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
15
|
Yi D, Nam JW, Jeong H. Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches. Brief Bioinform 2023; 24:bbad297. [PMID: 37587831 PMCID: PMC10516374 DOI: 10.1093/bib/bbad297] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/05/2023] [Accepted: 07/23/2023] [Indexed: 08/18/2023] Open
Abstract
Structural variants (SVs) are genomic rearrangements that can take many different forms such as copy number alterations, inversions and translocations. During cell development and aging, somatic SVs accumulate in the genome with potentially neutral, deleterious or pathological effects. Generation of somatic SVs is a key mutational process in cancer development and progression. Despite their importance, the detection of somatic SVs is challenging, making them less studied than somatic single-nucleotide variants. In this review, we summarize recent advances in whole-genome sequencing (WGS)-based approaches for detecting somatic SVs at the tissue and single-cell levels and discuss their advantages and limitations. First, we describe the state-of-the-art computational algorithms for somatic SV calling using bulk WGS data and compare the performance of somatic SV detectors in the presence or absence of a matched-normal control. We then discuss the unique features of cutting-edge single-cell-based techniques for analyzing somatic SVs. The advantages and disadvantages of bulk and single-cell approaches are highlighted, along with a discussion of their sensitivity to copy-neutral SVs, usefulness for functional inferences and experimental and computational costs. Finally, computational approaches for linking somatic SVs to their functional readouts, such as those obtained from single-cell transcriptome and epigenome analyses, are illustrated, with a discussion of the promise of these approaches in health and diseases.
Collapse
Affiliation(s)
- Dohun Yi
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Hyobin Jeong
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| |
Collapse
|
16
|
Nguyen BQT, Tran TPD, Nguyen HT, Nguyen TN, Pham TMQ, Nguyen HTP, Tran DH, Nguyen V, Tran TS, Pham TVN, Le MT, Phan MD, Giang H, Nguyen HN, Tran LS. Improvement in neoantigen prediction via integration of RNA sequencing data for variant calling. Front Immunol 2023; 14:1251603. [PMID: 37731488 PMCID: PMC10507271 DOI: 10.3389/fimmu.2023.1251603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 08/17/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction Neoantigen-based immunotherapy has emerged as a promising strategy for improving the life expectancy of cancer patients. This therapeutic approach heavily relies on accurate identification of cancer mutations using DNA sequencing (DNAseq) data. However, current workflows tend to provide a large number of neoantigen candidates, of which only a limited number elicit efficient and immunogenic T-cell responses suitable for downstream clinical evaluation. To overcome this limitation and increase the number of high-quality immunogenic neoantigens, we propose integrating RNA sequencing (RNAseq) data into the mutation identification step in the neoantigen prediction workflow. Methods In this study, we characterize the mutation profiles identified from DNAseq and/or RNAseq data in tumor tissues of 25 patients with colorectal cancer (CRC). Immunogenicity was then validated by ELISpot assay using long synthesis peptides (sLP). Results We detected only 22.4% of variants shared between the two methods. In contrast, RNAseq-derived variants displayed unique features of affinity and immunogenicity. We further established that neoantigen candidates identified by RNAseq data significantly increased the number of highly immunogenic neoantigens (confirmed by ELISpot) that would otherwise be overlooked if relying solely on DNAseq data. Discussion This integrative approach holds great potential for improving the selection of neoantigens for personalized cancer immunotherapy, ultimately leading to enhanced treatment outcomes and improved survival rates for cancer patients.
Collapse
Affiliation(s)
| | | | - Huu Thinh Nguyen
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | | | | | | | - Duc Huy Tran
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | - Vy Nguyen
- Medical Genetics Institute, Ho Chi Minh, Vietnam
| | - Thanh Sang Tran
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | | | - Minh-Triet Le
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | | | - Hoa Giang
- Medical Genetics Institute, Ho Chi Minh, Vietnam
| | | | - Le Son Tran
- Medical Genetics Institute, Ho Chi Minh, Vietnam
| |
Collapse
|
17
|
Grossi A, Rusmini M, Cusano R, Massidda M, Santamaria G, Napoli F, Angelelli A, Fava D, Uva P, Ceccherini I, Maghnie M. Whole genome sequencing in ROHHAD trios proved inconclusive: what's beyond? Front Genet 2023; 14:1031074. [PMID: 37609037 PMCID: PMC10440434 DOI: 10.3389/fgene.2023.1031074] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/27/2023] [Indexed: 08/24/2023] Open
Abstract
Rapid-onset Obesity with Hypothalamic dysfunction, Hypoventilation and Autonomic Dysregulation (ROHHAD) is a rare, life-threatening, pediatric disorder of unknown etiology, whose diagnosis is made difficult by poor knowledge of clinical manifestation, and lack of any confirmatory tests. Children with ROHHAD usually present with rapid onset weight gain which may be followed, over months or years, by hypothalamic dysfunction, hypoventilation, autonomic dysfunction, including impaired bowel motility, and tumors of neural crest origin. Despite the lack of evidence of inheritance in ROHHAD, several studies have been conducted in recent years that have explored possible genetic origins, with unsuccessful results. In order to broaden the search for possible genetic risk factors, an attempt was made to analyse the non-coding variants in two trios (proband with parents), recruited in the Gaslini Children's Hospital in Genoa (Italy). Both patients were females, with a typical history of ROHHAD. Gene variants (single nucleotide variants, short insertions/deletions, splice variants or in tandem expansion of homopolymeric tracts) or altered genomic regions (copy number variations or structural variants) shared between the two probands were searched. Currently, we have not found any potentially pathogenic changes, consistent with the ROHHAD clinical phenotype, and involving genes, regions or pathways shared between the two trios. To definitively rule out the genetic etiology, third-generation sequencing technologies (e.g., long-reads sequencing, optical mapping) should be applied, as well as other pathways, including those associated with immunological and autoimmune disorders, should be explored, making use not only of genomics but also of different -omic datasets.
Collapse
Affiliation(s)
- A. Grossi
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - M. Rusmini
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
- Clinical Bioinformatics, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - R. Cusano
- CRS4, Science and Technology Park Polaris, Pula, Italy
| | - M. Massidda
- CRS4, Science and Technology Park Polaris, Pula, Italy
| | - G. Santamaria
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - F. Napoli
- Pediatric Clinic and Endocrinology, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - A. Angelelli
- D.I.N.O.G.M.I, Università degli Studi di Genova, Genova, Italy
| | - D. Fava
- D.I.N.O.G.M.I, Università degli Studi di Genova, Genova, Italy
| | - P. Uva
- Clinical Bioinformatics, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - I. Ceccherini
- Laboratory of Genetics and Genomics of Rare Diseases, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - M. Maghnie
- Pediatric Clinic and Endocrinology, IRCCS Istituto Giannina Gaslini, Genova, Italy
- D.I.N.O.G.M.I, Università degli Studi di Genova, Genova, Italy
| |
Collapse
|
18
|
Alganmi N, Abusamra H. Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools. PLoS One 2023; 18:e0288371. [PMID: 37535628 PMCID: PMC10399881 DOI: 10.1371/journal.pone.0288371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 06/26/2023] [Indexed: 08/05/2023] Open
Abstract
The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.
Collapse
Affiliation(s)
- Nofe Alganmi
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Heba Abusamra
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
19
|
Wilton R, Szalay AS. Short-read aligner performance in germline variant identification. Bioinformatics 2023; 39:btad480. [PMID: 37527006 PMCID: PMC10421969 DOI: 10.1093/bioinformatics/btad480] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/01/2023] [Accepted: 07/31/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools. RESULTS In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy. We examine the performance of three general-purpose short-read aligners-BWA-MEM, Bowtie 2, and Arioc-in conjunction with three germline variant callers: DeepVariant, FreeBayes, and GATK HaplotypeCaller. We discuss the behavior of the read aligners with regard to the data elements on which the variant callers rely, and illustrate how the runtime configurations of these software tools combine to affect variant-calling performance. AVAILABILITY AND IMPLEMENTATION The quick brown fox jumps over the lazy dog.
Collapse
Affiliation(s)
- Richard Wilton
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Alexander S Szalay
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD 21218, United States
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
20
|
O'Connell KA, Yosufzai ZB, Campbell RA, Lobb CJ, Engelken HT, Gorrell LM, Carlson TB, Catana JJ, Mikdadi D, Bonazzi VR, Klenk JA. Accelerating genomic workflows using NVIDIA Parabricks. BMC Bioinformatics 2023; 24:221. [PMID: 37259021 PMCID: PMC10230726 DOI: 10.1186/s12859-023-05292-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 04/15/2023] [Indexed: 06/02/2023] Open
Abstract
BACKGROUND As genome sequencing becomes better integrated into scientific research, government policy, and personalized medicine, the primary challenge for researchers is shifting from generating raw data to analyzing these vast datasets. Although much work has been done to reduce compute times using various configurations of traditional CPU computing infrastructures, Graphics Processing Units (GPUs) offer opportunities to accelerate genomic workflows by orders of magnitude. Here we benchmark one GPU-accelerated software suite called NVIDIA Parabricks on Amazon Web Services (AWS), Google Cloud Platform (GCP), and an NVIDIA DGX cluster. We benchmarked six variant calling pipelines, including two germline callers (HaplotypeCaller and DeepVariant) and four somatic callers (Mutect2, Muse, LoFreq, SomaticSniper). RESULTS We achieved up to 65 × acceleration with germline variant callers, bringing HaplotypeCaller runtimes down from 36 h to 33 min on AWS, 35 min on GCP, and 24 min on the NVIDIA DGX. Somatic callers exhibited more variation between the number of GPUs and computing platforms. On cloud platforms, GPU-accelerated germline callers resulted in cost savings compared with CPU runs, whereas some somatic callers were more expensive than CPU runs because their GPU acceleration was not sufficient to overcome the increased GPU cost. CONCLUSIONS Germline variant callers scaled well with the number of GPUs across platforms, whereas somatic variant callers exhibited more variation in the number of GPUs with the fastest runtimes, suggesting that, at least with the version of Parabricks used here, these workflows are less GPU optimized and require benchmarking on the platform of choice before being deployed at production scales. Our study demonstrates that GPUs can be used to greatly accelerate genomic workflows, thus bringing closer to grasp urgent societal advances in the areas of biosurveillance and personalized medicine.
Collapse
Affiliation(s)
- Kyle A O'Connell
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | | | - Ross A Campbell
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | - Collin J Lobb
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | - Haley T Engelken
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | - Laura M Gorrell
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | - Thad B Carlson
- Cloud Managed Services, Deloitte Consulting LLP, Detroit, MI, 48226, USA
| | - Josh J Catana
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | - Dina Mikdadi
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA
| | - Vivien R Bonazzi
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA.
| | - Juergen A Klenk
- Health Data and AI, Deloitte Consulting LLP, VA, 22009, Arlington, USA.
| |
Collapse
|
21
|
Zhai Y, Bardel C, Vallée M, Iwaz J, Roy P. Performance comparisons between clustering models for reconstructing NGS results from technical replicates. Front Genet 2023; 14:1148147. [PMID: 37007945 PMCID: PMC10060969 DOI: 10.3389/fgene.2023.1148147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/18/2023] Open
Abstract
To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.
Collapse
Affiliation(s)
- Yue Zhai
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- *Correspondence: Yue Zhai,
| | - Claire Bardel
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
- Service de Génétique, Hospices Civils de Lyon, Bron, France
| | - Maxime Vallée
- Cellule Bioinformatique de La Plateforme de Séquençage Haut Débit NGS-HCL, Hospices Civils de Lyon, Bron, France
| | - Jean Iwaz
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| | - Pascal Roy
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
22
|
Park H, Gim J. A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome. RESEARCH SQUARE 2023:rs.3.rs-2580940. [PMID: 36945432 PMCID: PMC10029055 DOI: 10.21203/rs.3.rs-2580940/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and 'NA12878' (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in MHC of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data.
Collapse
|
23
|
Nagasaki M, Sekiya Y, Asakura A, Teraoka R, Otokozawa R, Hashimoto H, Kawaguchi T, Fukazawa K, Inadomi Y, Murata KT, Ohkawa Y, Yamaguchi I, Mizuhara T, Tokunaga K, Sekiya Y, Hanawa T, Yamada R, Matsuda F. Design and implementation of a hybrid cloud system for large-scale human genomic research. Hum Genome Var 2023; 10:6. [PMID: 36755016 PMCID: PMC9908893 DOI: 10.1038/s41439-023-00231-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 02/10/2023] Open
Abstract
In the field of genomic medical research, the amount of large-scale information continues to increase due to advances in measurement technologies, such as high-performance sequencing and spatial omics, as well as the progress made in genomic cohort studies involving more than one million individuals. Therefore, researchers require more computational resources to analyze this information. Here, we introduce a hybrid cloud system consisting of an on-premise supercomputer, science cloud, and public cloud at the Kyoto University Center for Genomic Medicine in Japan as a solution. This system can flexibly handle various heterogeneous computational resource-demanding bioinformatics tools while scaling the computational capacity. In the hybrid cloud system, we demonstrate the way to properly perform joint genotyping of whole-genome sequencing data for a large population of 11,238, which can be a bottleneck in sequencing data analysis. This system can be one of the reference implementations when dealing with large amounts of genomic medical data in research centers and organizations.
Collapse
Affiliation(s)
- Masao Nagasaki
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan.
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| | - Yayoi Sekiya
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Akihiro Asakura
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Ryo Teraoka
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Ryoko Otokozawa
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Hiroki Hashimoto
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
| | - Takahisa Kawaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Keiichiro Fukazawa
- Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan
| | - Yuichi Inadomi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Ken T Murata
- ICT Testbed Research and Development Promotion Center National Institute of Information and Communications Technology (NICT), Tokyo, Japan
| | - Yasuyuki Ohkawa
- Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Izumi Yamaguchi
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | | | - Katsushi Tokunaga
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yuji Sekiya
- Information Technology Center, The University of Tokyo, Chiba, Japan
| | - Toshihiro Hanawa
- Information Technology Center, The University of Tokyo, Chiba, Japan
| | - Ryo Yamada
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Fumihiko Matsuda
- Human Biosciences Unit for the Top Global Course Center for the Promotion of Interdisciplinary Education and Research (CPIER), Kyoto University, Kyoto, Japan
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| |
Collapse
|
24
|
Bai H, Zhang X, Bush WS. Pharmacogenomic and Statistical Analysis. Methods Mol Biol 2023; 2629:305-330. [PMID: 36929083 DOI: 10.1007/978-1-0716-2986-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Genetic variants can alter response to drugs and other therapeutic interventions. The study of this phenomenon, called pharmacogenomics, is similar in many ways to other types of genetic studies but has distinct methodological and statistical considerations. Genetic variants involved in the processing of exogenous compounds exhibit great diversity and complexity, and the phenotypes studied in pharmacogenomics are also more complex than typical genetic studies. In this chapter, we review basic concepts in pharmacogenomic study designs, data generation techniques, statistical analysis approaches, and commonly used methods and briefly discuss the ultimate translation of findings to clinical care.
Collapse
Affiliation(s)
- Haimeng Bai
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
- Department of Nutrition, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Xueyi Zhang
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
25
|
Betschart RO, Thiéry A, Aguilera-Garcia D, Zoche M, Moch H, Twerenbold R, Zeller T, Blankenberg S, Ziegler A. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci Rep 2022; 12:21502. [PMID: 36513709 PMCID: PMC9748128 DOI: 10.1038/s41598-022-26181-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 12/12/2022] [Indexed: 12/14/2022] Open
Abstract
Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F1 score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F1 score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK.
Collapse
Affiliation(s)
- Raphael O. Betschart
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland
| | - Alexandre Thiéry
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland
| | - Domingo Aguilera-Garcia
- grid.412004.30000 0004 0478 9977Institute of Pathology and Molecular Pathology, University Hospital Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
| | - Martin Zoche
- grid.412004.30000 0004 0478 9977Institute of Pathology and Molecular Pathology, University Hospital Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
| | - Holger Moch
- grid.412004.30000 0004 0478 9977Institute of Pathology and Molecular Pathology, University Hospital Zurich, Schmelzbergstrasse 12, 8091 Zurich, Switzerland
| | - Raphael Twerenbold
- grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,grid.13648.380000 0001 2180 3484University Center of Cardiovascular Research Hamburg, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,grid.452396.f0000 0004 5937 5237German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Tanja Zeller
- grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,grid.13648.380000 0001 2180 3484University Center of Cardiovascular Research Hamburg, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,grid.452396.f0000 0004 5937 5237German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Stefan Blankenberg
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland ,grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,grid.13648.380000 0001 2180 3484University Center of Cardiovascular Research Hamburg, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,grid.452396.f0000 0004 5937 5237German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, 7265 Davos Wolfgang, Switzerland ,grid.13648.380000 0001 2180 3484Department of Cardiology, University Heart & Vascular Center, University Medical Center Hamburg Eppendorf, Martinistr. 52, 20251 Hamburg, Germany ,School Mathematics, Statistics and Computer Science, Scottsville, Private Bag X01, Pietermaritzburg, 3209 South Africa
| |
Collapse
|
26
|
Chen J, Ying L, Zeng L, Li C, Jia Y, Yang H, Yang G. The novel compound heterozygous rare variants may impact positively selected regions of TUBGCP6, a microcephaly associated gene. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.1059477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
IntroductionThe microcephaly is a rare and severe disease probably under purifying selection due to the reduction of human brain-size. In contrast, the brain-size enlargement is most probably driven by positive selection, in light of this critical phenotypical innovation during primates and human evolution. Thus, microcephaly-related genes were extensively studied for signals of positive selection. However, whether the pathogenic variants of microcephaly-related genes could affect the regions of positive selection is still unclear.MethodsHere, we conducted whole genome sequencing (WGS) and positive selection analysis.ResultsWe identified novel compound heterozygous variants, p.Y613* and p.E1368K in TUBGCP6, related to microcephaly in a Chinese family. The genotyping and the sanger sequencing revealed the maternal and the paternal origin for the first and second variant, respectively. The p.Y613* occurred before the second and third domain of TUBGCP6 protein, while p.E1368K located within the linker region of the second and third domain. Interestingly, using multiple positive selection analyses, we revealed the potential impacts of these variants on the regions of positive selection of TUBGCP6. The truncating variant p.Y613* could lead to the deletions of two positively selected domains DUF5401 and Spc97_Spc98, while p.E1368K could impose a rare mutation burden on the linker region between these two domains.DiscussionOur investigation expands the list of candidate pathogenic variants of TUBGCP6 that may cause microcephaly. Moreover, the study provides insights into the potential pathogenic effects of variants that truncate or distribute within the positively selected regions.
Collapse
|
27
|
In vitro germ cell induction from fertile and infertile monozygotic twin research participants. Cell Rep Med 2022; 3:100782. [PMID: 36260988 PMCID: PMC9589117 DOI: 10.1016/j.xcrm.2022.100782] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/23/2022] [Accepted: 09/22/2022] [Indexed: 11/08/2022]
Abstract
Human induced pluripotent stem cells (hiPSCs) enable reproductive diseases to be studied when the reproductive health of the participant is known. In this study, monozygotic (MZ) monoamniotic (MA) twins discordant for primary ovarian insufficiency (POI) consent to research to address the hypothesis that discordant POI is due to a shared primordial germ cell (PGC) progenitor pool. If this is the case, reprogramming the twin's skin cells to hiPSCs is expected to restore equivalent germ cell competency to the twins hiPSCs. Following reprogramming, the infertile MA twin's cells are capable of generating human PGC-like cells (hPGCLCs) and amniotic sac-like structures equivalent to her fertile twin sister. Using these hiPSCs together with genome sequencing, our data suggest that POI in the infertile twin is not due to a genetic barrier to amnion or germ cell formation and support the hypothesis that during gestation, amniotic PGCs are likely disproportionately allocated to the fertile twin with embryo splitting.
Collapse
|
28
|
Zhang K, Yu L, Lin G, Li J. A multi-laboratory assessment of clinical exome sequencing for detection of hereditary disease variants: 4441 ClinVar variants for clinical genomic test development and validation. Clin Chim Acta 2022; 535:99-107. [PMID: 35985503 DOI: 10.1016/j.cca.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/01/2022] [Accepted: 08/05/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND AIMS Whole-exome sequencing (WES) technology has become an essential tool in the clinical diagnostic for rare genetic disorders, however, the issues that reduce testing precision, sensitivity, and concordance are not clear under routine testing conditions. The study is to systematically evaluate the comparability of clinical WES testing results in laboratories under routine conditions. METHODS We designed a multi-laboratory study across 24 participating laboratories in China. We assessed sequencing quality across capture methods and sequencing platforms, benchmarked the impact of coverage and callable regions on detecting single nucleotide variants (SNVs), small insertions and deletions (Indels) under the same computational approaches, and compared the sensitivity, precision and reproducibility on detecting mutations across laboratories. RESULTS High inter-laboratory variability on variants detection were found across participating laboratories. Sample DNA concentration and sequencing evenness are two major variables that lead to the coverage variation. The difference in bioinformatics tools and computational settings affect the sensitivity and precision of the final output. Besides, copy-number variants (CNVs) identification is less reproducible than SNVs and Indels in the WES testing. We also compiled a list of 4441 low coverage ClinVar variants of 1176 genes from this study, which can be used as a source for creating in silico and synthetic DNA reference materials for clinical genetic disorder detection. CONCLUSIONS The considerable inter-laboratory variability seen in both sequencing coverage evenness and variants detection highlights the urgent need to improve the precision, sensitivity and comparability of the results generated across different laboratories. The list of low coverage variants can have important implications for the development and validation of clinical genetic disorder tests by laboratories. This study also serves to best practice inform guidelines for detecting clinical genetic disorders by exome sequencing.
Collapse
Affiliation(s)
- Kuo Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, PR China
| | - Lijia Yu
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, PR China; National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China
| | - Guigao Lin
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, PR China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, PR China; National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, PR China.
| |
Collapse
|
29
|
Mohr DW, Gaughran SJ, Paschall J, Naguib A, Pang AWC, Dudchenko O, Aiden EL, Church DM, Scott AF. A Chromosome-Length Assembly of the Hawaiian Monk Seal (Neomonachus schauinslandi): A History of “Genetic Purging” and Genomic Stability. Genes (Basel) 2022; 13:genes13071270. [PMID: 35886053 PMCID: PMC9323584 DOI: 10.3390/genes13071270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 06/29/2022] [Accepted: 07/07/2022] [Indexed: 12/04/2022] Open
Abstract
The Hawaiian monk seal (HMS) is the single extant species of tropical earless seals of the genus Neomonachus. The species survived a severe bottleneck in the late 19th century and experienced subsequent population declines until becoming the subject of a NOAA-led species recovery effort beginning in 1976 when the population was fewer than 1000 animals. Like other recovering species, the Hawaiian monk seal has been reported to have reduced genetic heterogeneity due to the bottleneck and subsequent inbreeding. Here, we report a chromosomal reference assembly for a male animal produced using a variety of methods. The final assembly consisted of 16 autosomes, an X, and portions of the Y chromosomes. We compared variants in this animal to other HMS and to a frequently sequenced human sample, confirming about 12% of the variation seen in man. To confirm that the reference animal was representative of the HMS, we compared his sequence to that of 10 other individuals and noted similarly low variation in all. Variation in the major histocompatibility (MHC) genes was nearly absent compared to the orthologous human loci. Demographic analysis predicts that Hawaiian monk seals have had a long history of small populations preceding the bottleneck, and their current low levels of heterozygosity may indicate specialization to a stable environment. When we compared our reference assembly to that of other species, we observed significant conservation of chromosomal architecture with other pinnipeds, especially other phocids. This reference should be a useful tool for future evolutionary studies as well as the long-term management of this species.
Collapse
Affiliation(s)
- David W. Mohr
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (D.W.M.); (J.P.)
| | - Stephen J. Gaughran
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA;
| | - Justin Paschall
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (D.W.M.); (J.P.)
| | - Ahmed Naguib
- Bionano Genomics, Inc., 9640 Towne Centre Dr., Suite 100, San Diego, CA 92121, USA; (A.N.); (A.W.C.P.)
| | - Andy Wing Chun Pang
- Bionano Genomics, Inc., 9640 Towne Centre Dr., Suite 100, San Diego, CA 92121, USA; (A.N.); (A.W.C.P.)
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; (O.D.); (E.L.A.)
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; (O.D.); (E.L.A.)
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- UWA School of Agriculture and Environment, The University of Western Australia, Crawley, WA 6009, Australia
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, Shanghai 201210, China
| | | | - Alan F. Scott
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (D.W.M.); (J.P.)
- Correspondence:
| |
Collapse
|
30
|
Abstract
Whole Exome Sequencing (WES) is used for querying DNA variants using the protein coding parts of genomes (exomes). However, WES analysis can be challenging because of the complexity of the data. Here, we describe a consolidated protocol for unbiased WES analysis. The protocol uses three variant callers (HaplotypeCaller, FreeBayes, and DeepVariant), which have different underlying models. We provide detailed execution steps, as well as basic variant filtering, annotation, visualization, and consolidation aspects. Protocol to enable whole exome data analysis in an unbiased approach A protocol for unbiased analysis using 3 variant callers with different underlying models From raw data to filtered, consolidated, and annotated DNA variant calls
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
Collapse
|
31
|
The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci Rep 2022; 12:9863. [PMID: 35701436 PMCID: PMC9198055 DOI: 10.1038/s41598-022-13269-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 05/18/2022] [Indexed: 01/11/2023] Open
Abstract
The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.
Collapse
|
32
|
Schmidt J, Berghaus S, Blessing F, Herbeck H, Blessing J, Schierack P, Rödiger S, Roggenbuck D, Wenzel F. Genotyping of familial Mediterranean fever gene (MEFV)-Single nucleotide polymorphism-Comparison of Nanopore with conventional Sanger sequencing. PLoS One 2022; 17:e0265622. [PMID: 35298548 PMCID: PMC8929590 DOI: 10.1371/journal.pone.0265622] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 03/04/2022] [Indexed: 11/18/2022] Open
Abstract
Background Through continuous innovation and improvement, Nanopore sequencing has become a powerful technology. Because of its fast processing time, low cost, and ability to generate long reads, this sequencing technique would be particularly suitable for clinical diagnostics. However, its raw data accuracy is inferior in contrast to other sequencing technologies. This constraint still results in limited use of Nanopore sequencing in the field of clinical diagnostics and requires further validation and IVD certification. Methods We evaluated the performance of latest Nanopore sequencing in combination with a dedicated data-analysis pipeline for single nucleotide polymorphism (SNP) genotyping of the familial Mediterranean fever gene (MEFV) by amplicon sequencing of 47 clinical samples. Mutations in MEFV are associated with Mediterranean fever, a hereditary periodic fever syndrome. Conventional Sanger sequencing, which is commonly applied in clinical genetic diagnostics, was used as a reference method. Results Nanopore sequencing enabled the sequencing of 10 target regions within MEFV with high read depth (median read depth 7565x) in all samples and identified a total of 435 SNPs in the whole sample collective, of which 29 were unique. Comparison of both sequencing workflows showed a near perfect agreement with no false negative calls. Precision, Recall, and F1-Score of the Nanopore sequencing workflow were > 0.99, respectively. Conclusions These results demonstrated the great potential of current Nanopore sequencing for application in clinical diagnostics, at least for SNP genotyping by amplicon sequencing. Other more complex applications, especially structural variant identification, require further in-depth clinical validation.
Collapse
Affiliation(s)
- Jonas Schmidt
- Institute for Laboratory Medicine, Singen, Germany
- Faculty of Medical and Life Sciences, Furtwangen University, Villingen-Schwenningen, Germany
- Faculty Environment and Natural Sciences, Institute of Biotechnology, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany
| | | | - Frithjof Blessing
- Institute for Laboratory Medicine, Singen, Germany
- Faculty of Medical and Life Sciences, Furtwangen University, Villingen-Schwenningen, Germany
| | | | | | - Peter Schierack
- Faculty Environment and Natural Sciences, Institute of Biotechnology, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany
- Faculty of Health Sciences Brandenburg, Brandenburg University of Technology Cottbus–Senftenberg, Senftenberg, Germany
| | - Stefan Rödiger
- Faculty Environment and Natural Sciences, Institute of Biotechnology, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany
- Faculty of Health Sciences Brandenburg, Brandenburg University of Technology Cottbus–Senftenberg, Senftenberg, Germany
| | - Dirk Roggenbuck
- Faculty Environment and Natural Sciences, Institute of Biotechnology, Brandenburg University of Technology Cottbus-Senftenberg, Senftenberg, Germany
- Faculty of Health Sciences Brandenburg, Brandenburg University of Technology Cottbus–Senftenberg, Senftenberg, Germany
- * E-mail:
| | - Folker Wenzel
- Faculty of Medical and Life Sciences, Furtwangen University, Villingen-Schwenningen, Germany
| |
Collapse
|
33
|
Acosta-Uribe J, Aguillón D, Cochran JN, Giraldo M, Madrigal L, Killingsworth BW, Singhal R, Labib S, Alzate D, Velilla L, Moreno S, García GP, Saldarriaga A, Piedrahita F, Hincapié L, López HE, Perumal N, Morelo L, Vallejo D, Solano JM, Reiman EM, Surace EI, Itzcovich T, Allegri R, Sánchez-Valle R, Villegas-Lanau A, White CL, Matallana D, Myers RM, Browning SR, Lopera F, Kosik KS. A neurodegenerative disease landscape of rare mutations in Colombia due to founder effects. Genome Med 2022; 14:27. [PMID: 35260199 PMCID: PMC8902761 DOI: 10.1186/s13073-022-01035-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 02/26/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Colombian population, as well as those in other Latin American regions, arose from a recent tri-continental admixture among Native Americans, Spanish invaders, and enslaved Africans, all of whom passed through a population bottleneck due to widespread infectious diseases that left small isolated local settlements. As a result, the current population reflects multiple founder effects derived from diverse ancestries. METHODS We characterized the role of admixture and founder effects on the origination of the mutational landscape that led to neurodegenerative disorders under these historical circumstances. Genomes from 900 Colombian individuals with Alzheimer's disease (AD) [n = 376], frontotemporal lobar degeneration-motor neuron disease continuum (FTLD-MND) [n = 197], early-onset dementia not otherwise specified (EOD) [n = 73], and healthy participants [n = 254] were analyzed. We examined their global and local ancestry proportions and screened this cohort for deleterious variants in disease-causing and risk-conferring genes. RESULTS We identified 21 pathogenic variants in AD-FTLD related genes, and PSEN1 harbored the majority (11 pathogenic variants). Variants were identified from all three continental ancestries. TREM2 heterozygous and homozygous variants were the most common among AD risk genes (102 carriers), a point of interest because the disease risk conferred by these variants differed according to ancestry. Several gene variants that have a known association with MND in European populations had FTLD phenotypes on a Native American haplotype. Consistent with founder effects, identity by descent among carriers of the same variant was frequent. CONCLUSIONS Colombian demography with multiple mini-bottlenecks probably enhanced the detection of founder events and left a proportionally higher frequency of rare variants derived from the ancestral populations. These findings demonstrate the role of genomically defined ancestry in phenotypic disease expression, a phenotypic range of different rare mutations in the same gene, and further emphasize the importance of inclusiveness in genetic studies.
Collapse
Affiliation(s)
- Juliana Acosta-Uribe
- Neuroscience Research Institute and Department of Molecular Cellular and Developmental Biology, University of California, Santa Barbara, CA, USA
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - David Aguillón
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | | | - Margarita Giraldo
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
- Instituto Neurológico de Colombia (INDEC), Medellín, Colombia
| | - Lucía Madrigal
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Bradley W Killingsworth
- Neuroscience Research Institute and Department of Molecular Cellular and Developmental Biology, University of California, Santa Barbara, CA, USA
| | - Rijul Singhal
- Neuroscience Research Institute and Department of Molecular Cellular and Developmental Biology, University of California, Santa Barbara, CA, USA
| | - Sarah Labib
- Neuroscience Research Institute and Department of Molecular Cellular and Developmental Biology, University of California, Santa Barbara, CA, USA
| | - Diana Alzate
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Lina Velilla
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Sonia Moreno
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Gloria P García
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Amanda Saldarriaga
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Francisco Piedrahita
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Liliana Hincapié
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Hugo E López
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Nithesh Perumal
- Neuroscience Research Institute and Department of Molecular Cellular and Developmental Biology, University of California, Santa Barbara, CA, USA
| | - Leonilde Morelo
- Department of Internal Medicine, School of Medicine, Universidad del Sinú, Montería, Colombia
| | - Dionis Vallejo
- Department of Neurology, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Juan Marcos Solano
- Department of Neurology, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | | | - Ezequiel I Surace
- Laboratorio de Enfermedades Neurodegenerativas (Fleni-CONICET), Buenos Aires, Argentina
| | - Tatiana Itzcovich
- Laboratorio de Enfermedades Neurodegenerativas (Fleni-CONICET), Buenos Aires, Argentina
| | - Ricardo Allegri
- Centro de Memoria y Envejecimiento (Fleni-CONICET), Buenos Aires, Argentina
| | - Raquel Sánchez-Valle
- Alzheimer's Disease and Other Cognitive Disorders Unit, Hospital Clínic de Barcelona, IDIBAPS and University of Barcelona, Barcelona, Spain
| | - Andrés Villegas-Lanau
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | - Charles L White
- Neuropathology Section, Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Diana Matallana
- Instituto de Envejecimiento, Department of Psychiatry, School of Medicine, Pontifical Xaverian University, Bogotá, Colombia
- Department of Mental Health, Hospital Universitario Santa Fe de Bogotá, Bogotá, Colombia
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Francisco Lopera
- Grupo de Neurociencias de Antioquia, School of Medicine, Universidad de Antioquia, Medellín, Colombia.
| | - Kenneth S Kosik
- Neuroscience Research Institute and Department of Molecular Cellular and Developmental Biology, University of California, Santa Barbara, CA, USA.
| |
Collapse
|
34
|
Pillay NS, Ross OA, Christoffels A, Bardien S. Current Status of Next-Generation Sequencing Approaches for Candidate Gene Discovery in Familial Parkinson´s Disease. Front Genet 2022; 13:781816. [PMID: 35299952 PMCID: PMC8921601 DOI: 10.3389/fgene.2022.781816] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
Parkinson’s disease is a neurodegenerative disorder with a heterogeneous genetic etiology. The advent of next-generation sequencing (NGS) technologies has aided novel gene discovery in several complex diseases, including PD. This Perspective article aimed to explore the use of NGS approaches to identify novel loci in familial PD, and to consider their current relevance. A total of 17 studies, spanning various populations (including Asian, Middle Eastern and European ancestry), were identified. All the studies used whole-exome sequencing (WES), with only one study incorporating both WES and whole-genome sequencing. It is worth noting how additional genetic analyses (including linkage analysis, haplotyping and homozygosity mapping) were incorporated to enhance the efficacy of some studies. Also, the use of consanguineous families and the specific search for de novo mutations appeared to facilitate the finding of causal mutations. Across the studies, similarities and differences in downstream analysis methods and the types of bioinformatic tools used, were observed. Although these studies serve as a practical guide for novel gene discovery in familial PD, these approaches have not significantly resolved the “missing heritability” of PD. We speculate that what is needed is the use of third-generation sequencing technologies to identify complex genomic rearrangements and new sequence variation, missed with existing methods. Additionally, the study of ancestrally diverse populations (in particular those of Black African ancestry), with the concomitant optimization and tailoring of sequencing and analytic workflows to these populations, are critical. Only then, will this pave the way for exciting new discoveries in the field.
Collapse
Affiliation(s)
- Nikita Simone Pillay
- South African National Bioinformatics Institute (SANBI), South African Medical Research Council Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Owen A. Ross
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, United States
- Department of Clinical Genomics, Mayo Clinic, Jacksonville, FL, United States
| | - Alan Christoffels
- South African National Bioinformatics Institute (SANBI), South African Medical Research Council Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
- Africa Centres for Disease Control and Prevention, African Union Headquarters, Addis Ababa, Ethiopia
| | - Soraya Bardien
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders Research Unit, Cape Town, South Africa
- *Correspondence: Soraya Bardien,
| |
Collapse
|
35
|
Barbitoff YA, Abasov R, Tvorogova VE, Glotov AS, Predeus AV. Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery. BMC Genomics 2022; 23:155. [PMID: 35193511 PMCID: PMC8862519 DOI: 10.1186/s12864-022-08365-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 02/03/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Accurate variant detection in the coding regions of the human genome is a key requirement for molecular diagnostics of Mendelian disorders. Efficiency of variant discovery from next-generation sequencing (NGS) data depends on multiple factors, including reproducible coverage biases of NGS methods and the performance of read alignment and variant calling software. Although variant caller benchmarks are published constantly, no previous publications have leveraged the full extent of available gold standard whole-genome (WGS) and whole-exome (WES) sequencing datasets. RESULTS In this work, we systematically evaluated the performance of 4 popular short read aligners (Bowtie2, BWA, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Clair3, DeepVariant, Octopus, GATK, FreeBayes, and Strelka2) using a set of 14 "gold standard" WES and WGS datasets available from Genome In A Bottle (GIAB) consortium. Additionally, we have indirectly evaluated each pipeline's performance using a set of 6 non-GIAB samples of African and Russian ethnicity. In our benchmark, Bowtie2 performed significantly worse than other aligners, suggesting it should not be used for medical variant calling. When other aligners were considered, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. Among the tested variant callers, DeepVariant consistently showed the best performance and the highest robustness. Other actively developed tools, such as Clair3, Octopus, and Strelka2, also performed well, although their efficiency had greater dependence on the quality and type of the input data. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. CONCLUSIONS The results show surprisingly large differences in the performance of cutting-edge tools even in high confidence regions of the coding genome. This highlights the importance of regular benchmarking of quickly evolving tools and pipelines. We also discuss the need for a more diverse set of gold standard genomes that would include samples of African, Hispanic, or mixed ancestry. Additionally, there is also a need for better variant caller assessment in the repetitive regions of the coding genome.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Bioinformatics Institute, St. Petersburg, Russia. .,Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg, Russia. .,Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia.
| | - Ruslan Abasov
- Bioinformatics Institute, St. Petersburg, Russia.,Dmitry Rogachev National Research Center of Pediatric Hematology-Oncology and Immunology, Moscow, Russia
| | - Varvara E Tvorogova
- Bioinformatics Institute, St. Petersburg, Russia.,Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| | - Andrey S Glotov
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg, Russia
| | | |
Collapse
|
36
|
Wang N, Lysenkov V, Orte K, Kairisto V, Aakko J, Khan S, Elo LL. Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data. PLoS Comput Biol 2022; 18:e1009269. [PMID: 35176018 PMCID: PMC8916674 DOI: 10.1371/journal.pcbi.1009269] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 03/11/2022] [Accepted: 01/30/2022] [Indexed: 11/18/2022] Open
Abstract
Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.
Collapse
Affiliation(s)
- Ning Wang
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Vladislav Lysenkov
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Katri Orte
- Department of Pathology, Laboratory Division, Turku University Hospital, Turku, Finland
- Department of Genomics, Laboratory Division, Turku University Hospital, Turku, Finland
| | - Veli Kairisto
- Department of Genomics, Laboratory Division, Turku University Hospital, Turku, Finland
| | - Juhani Aakko
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- * E-mail: (SK); (LLE)
| | - Laura L. Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Institute of Biomedicine, University of Turku, Finland
- * E-mail: (SK); (LLE)
| |
Collapse
|
37
|
Establishment of reference standards for multifaceted mosaic variant analysis. Sci Data 2022; 9:35. [PMID: 35115554 PMCID: PMC8813952 DOI: 10.1038/s41597-022-01133-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/20/2021] [Indexed: 11/21/2022] Open
Abstract
Detection of somatic mosaicism in non-proliferative cells is a new challenge in genome research, however, the accuracy of current detection strategies remains uncertain due to the lack of a ground truth. Herein, we sought to present a set of ultra-deep sequenced WES data based on reference standards generated by cell line mixtures, providing a total of 386,613 mosaic single-nucleotide variants (SNVs) and insertion-deletion mutations (INDELs) with variant allele frequencies (VAFs) ranging from 0.5% to 56%, as well as 35,113,417 non-variant and 19,936 germline variant sites as a negative control. The whole reference standard set mimics the cumulative aspect of mosaic variant acquisition such as in the early developmental stage owing to the progressive mixing of cell lines with established genotypes, ultimately unveiling 741 possible inter-sample relationships with respect to variant sharing and asymmetry in VAFs. We expect that our reference data will be essential for optimizing the current use of mosaic variant detection strategies and for developing algorithms to enable future improvements. Measurement(s) | genotype | Technology Type(s) | DNA sequencing | Factor Type(s) | genotyping | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | cell line |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.16970041
Collapse
|
38
|
Sahraeian SME, Fang LT, Karagiannis K, Moos M, Smith S, Santana-Quintero L, Xiao C, Colgan M, Hong H, Mohiyuddin M, Xiao W. Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. Genome Biol 2022; 23:12. [PMID: 34996510 PMCID: PMC8740374 DOI: 10.1186/s13059-021-02592-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. RESULTS In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. CONCLUSIONS The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.
Collapse
Affiliation(s)
| | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA, 95050, USA
| | - Konstantinos Karagiannis
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Malcolm Moos
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Sean Smith
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Luis Santana-Quintero
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Michael Colgan
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Huixiao Hong
- Bioinformatics branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR, 72079, USA
| | | | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA.
| |
Collapse
|
39
|
Yeh CH, Chou YJ, Tsai TH, Hsu PWC, Li CH, Chan YH, Tsai SF, Ng SC, Chou KM, Lin YC, Juan YH, Fu TC, Lai CC, Sytwu HK, Tsai TF. Artificial-Intelligence-Assisted Discovery of Genetic Factors for Precision Medicine of Antiplatelet Therapy in Diabetic Peripheral Artery Disease. Biomedicines 2022; 10:biomedicines10010116. [PMID: 35052795 PMCID: PMC8773099 DOI: 10.3390/biomedicines10010116] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 12/30/2021] [Accepted: 01/04/2022] [Indexed: 12/15/2022] Open
Abstract
An increased risk of cardiovascular events was identified in patients with peripheral artery disease (PAD). Clopidogrel is one of the most widely used antiplatelet medications. However, there are heterogeneous outcomes when clopidogrel is used to prevent cardiovascular events in PAD patients. Here, we use an artificial intelligence (AI)-assisted methodology to identify genetic factors potentially involved in the clopidogrel-resistant mechanism, which is currently unclear. Several discoveries can be pinpointed. Firstly, a high proportion (>50%) of clopidogrel resistance was found among diabetic PAD patients in Taiwan. Interestingly, our result suggests that platelet function test-guided antiplatelet therapy appears to reduce the post-interventional occurrence of major adverse cerebrovascular and cardiac events in diabetic PAD patients. Secondly, AI-assisted genome-wide association study of a single-nucleotide polymorphism (SNP) database identified a SNP signature composed of 20 SNPs, which are mapped into 9 protein-coding genes (SLC37A2, IQSEC1, WASHC3, PSD3, BTBD7, GLIS3, PRDM11, LRBA1, and CNR1). Finally, analysis of the protein connectivity map revealed that LRBA, GLIS3, BTBD7, IQSEC1, and PSD3 appear to form a protein interaction network. Intriguingly, the genetic factors seem to pinpoint a pathway related to endocytosis and recycling of P2Y12 receptor, which is the drug target of clopidogrel. Our findings reveal that a combination of AI-assisted discovery of SNP signatures and clinical parameters has the potential to develop an ethnic-specific precision medicine for antiplatelet therapy in diabetic PAD patients.
Collapse
Affiliation(s)
- Chi-Hsiao Yeh
- Department of Thoracic and Cardiovascular Surgery, Chang Gung Memorial Hospital, Taoyuan 333, Taiwan;
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan; (Y.-C.L.); (Y.-H.J.); (T.-C.F.)
- Community Medicine Research Center, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Yi-Ju Chou
- Institute of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli 350, Taiwan; (Y.-J.C.); (P.W.-C.H.); (S.-F.T.)
| | - Tsung-Hsien Tsai
- Advanced Tech BU, Acer Inc., New Taipei City 221, Taiwan; (T.-H.T.); (C.-H.L.); (Y.-H.C.)
| | - Paul Wei-Che Hsu
- Institute of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli 350, Taiwan; (Y.-J.C.); (P.W.-C.H.); (S.-F.T.)
| | - Chun-Hsien Li
- Advanced Tech BU, Acer Inc., New Taipei City 221, Taiwan; (T.-H.T.); (C.-H.L.); (Y.-H.C.)
| | - Yun-Hsuan Chan
- Advanced Tech BU, Acer Inc., New Taipei City 221, Taiwan; (T.-H.T.); (C.-H.L.); (Y.-H.C.)
| | - Shih-Feng Tsai
- Institute of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli 350, Taiwan; (Y.-J.C.); (P.W.-C.H.); (S.-F.T.)
| | - Soh-Ching Ng
- Department of Internal Medicine, Division of Endocrinology and Metabolism, Chang Gung Memorial Hospital, Keelung 204, Taiwan; (S.-C.N.); (K.-M.C.)
| | - Kuei-Mei Chou
- Department of Internal Medicine, Division of Endocrinology and Metabolism, Chang Gung Memorial Hospital, Keelung 204, Taiwan; (S.-C.N.); (K.-M.C.)
| | - Yu-Ching Lin
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan; (Y.-C.L.); (Y.-H.J.); (T.-C.F.)
- Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Yu-Hsiang Juan
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan; (Y.-C.L.); (Y.-H.J.); (T.-C.F.)
- Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Tieh-Cheng Fu
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan; (Y.-C.L.); (Y.-H.J.); (T.-C.F.)
- Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Chi-Chun Lai
- College of Medicine, Chang Gung University, Taoyuan 333, Taiwan; (Y.-C.L.); (Y.-H.J.); (T.-C.F.)
- Community Medicine Research Center, Chang Gung Memorial Hospital, Keelung 204, Taiwan
- Department of Ophthalmology, Chang Gung Memorial Hospital, Keelung 204, Taiwan
- Correspondence: (C.-C.L.); (H.-K.S.); (T.-F.T.); Tel.: +886-2-24313131 (ext. 6101) (C.-C.L.); +886-37-206166 (ext. 31010) (H.-K.S.); +886-2-28267293 (T.-F.T.)
| | - Huey-Kang Sytwu
- National Institute of Infectious Diseases and Vaccinology, National Health Research Institutes, Miaoli 350, Taiwan
- National Defense Medical Center, Department & Graduate Institute of Microbiology and Immunology, Taipei 114, Taiwan
- Correspondence: (C.-C.L.); (H.-K.S.); (T.-F.T.); Tel.: +886-2-24313131 (ext. 6101) (C.-C.L.); +886-37-206166 (ext. 31010) (H.-K.S.); +886-2-28267293 (T.-F.T.)
| | - Ting-Fen Tsai
- Institute of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli 350, Taiwan; (Y.-J.C.); (P.W.-C.H.); (S.-F.T.)
- Departments of Life Sciences and Institute of Genome Sciences, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
- Center for Healthy Longevity and Aging Sciences, National Yang Ming Chiao Tung University, Taipei 112, Taiwan
- Correspondence: (C.-C.L.); (H.-K.S.); (T.-F.T.); Tel.: +886-2-24313131 (ext. 6101) (C.-C.L.); +886-37-206166 (ext. 31010) (H.-K.S.); +886-2-28267293 (T.-F.T.)
| |
Collapse
|
40
|
The correctness of large scale analysis of genomic data. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2021. [DOI: 10.2478/fcds-2021-0024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Abstract
Implementing a large genomic project is a demanding task, also from the computer science point of view. Besides collecting many genome samples and sequencing them, there is processing of a huge amount of data at every stage of their production and analysis. Efficient transfer and storage of the data is also an important issue. During the execution of such a project, there is a need to maintain work standards and control quality of the results, which can be difficult if a part of the work is carried out externally. Here, we describe our experience with such data quality analysis on a number of levels - from an obvious check of the quality of the results obtained, to examining consistency of the data at various stages of their processing, to verifying, as far as possible, their compatibility with the data describing the sample.
Collapse
|
41
|
Yan YH, Chen SX, Cheng LY, Rodriguez AY, Tang R, Cabrera K, Zhang DY. Confirming putative variants at ≤ 5% allele frequency using allele enrichment and Sanger sequencing. Sci Rep 2021; 11:11640. [PMID: 34079006 PMCID: PMC8172533 DOI: 10.1038/s41598-021-91142-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 05/21/2021] [Indexed: 12/19/2022] Open
Abstract
Whole exome sequencing (WES) is used to identify mutations in a patient's tumor DNA that are predictive of tumor behavior, including the likelihood of response or resistance to cancer therapy. WES has a mutation limit of detection (LoD) at variant allele frequencies (VAF) of 5%. Putative mutations called at ≤ 5% VAF are frequently due to sequencing errors, therefore reporting these subclonal mutations incurs risk of significant false positives. Here we performed ~ 1000 × WES on fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue biopsy samples from a non-small cell lung cancer patient, and identified 226 putative mutations at between 0.5 and 5% VAF. Each variant was then tested using NuProbe NGSure, to confirm the original WES calls. NGSure utilizes Blocker Displacement Amplification to first enrich the allelic fraction of the mutation and then uses Sanger sequencing to determine mutation identity. Results showed that 52% of the 226 (117) putative variants were disconfirmed, among which 2% (5) putative variants were found to be misidentified in WES. In the 66 cancer-related variants, the disconfirmed rate was 82% (54/66). This data demonstrates Blocker Displacement Amplification allelic enrichment coupled with Sanger sequencing can be used to confirm putative mutations ≤ 5% VAF. By implementing this method, next-generation sequencing can reliably report low-level variants at a high sensitivity, without the cost of high sequencing depth.
Collapse
Affiliation(s)
| | - Sherry X Chen
- Department of Bioengineering, Rice University, 6500 Main St, Houston, TX, 77030, USA
| | - Lauren Y Cheng
- Department of Bioengineering, Rice University, 6500 Main St, Houston, TX, 77030, USA
| | | | - Rui Tang
- NuProbe USA, Inc., Houston, TX, USA
| | | | - David Yu Zhang
- Department of Bioengineering, Rice University, 6500 Main St, Houston, TX, 77030, USA.
- Systems, Synthetic, and Physical Biology, Rice University, 6500 Main St, Houston, TX, 77030, USA.
| |
Collapse
|
42
|
Prins BP, Leitsalu L, Pärna K, Fischer K, Metspalu A, Haller T, Snieder H. Advances in Genomic Discovery and Implications for Personalized Prevention and Medicine: Estonia as Example. J Pers Med 2021; 11:jpm11050358. [PMID: 33946982 PMCID: PMC8145318 DOI: 10.3390/jpm11050358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/19/2021] [Accepted: 04/25/2021] [Indexed: 02/07/2023] Open
Abstract
The current paradigm of personalized medicine envisages the use of genomic data to provide predictive information on the health course of an individual with the aim of prevention and individualized care. However, substantial efforts are required to realize the concept: enhanced genetic discoveries, translation into intervention strategies, and a systematic implementation in healthcare. Here we review how further genetic discoveries are improving personalized prediction and advance functional insights into the link between genetics and disease. In the second part we give our perspective on the way these advances in genomic research will transform the future of personalized prevention and medicine using Estonia as a primer.
Collapse
Affiliation(s)
- Bram Peter Prins
- MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
- Correspondence: (B.P.P.); (H.S.)
| | - Liis Leitsalu
- Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; (L.L.); (K.P.); (K.F.); (A.M.); (T.H.)
| | - Katri Pärna
- Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; (L.L.); (K.P.); (K.F.); (A.M.); (T.H.)
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, 9700 RB Groningen, The Netherlands
- Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia
| | - Krista Fischer
- Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; (L.L.); (K.P.); (K.F.); (A.M.); (T.H.)
- Institute of Mathematics and Statistics, University of Tartu, 50409 Tartu, Estonia
| | - Andres Metspalu
- Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; (L.L.); (K.P.); (K.F.); (A.M.); (T.H.)
| | - Toomas Haller
- Institute of Genomics, University of Tartu, 51010 Tartu, Estonia; (L.L.); (K.P.); (K.F.); (A.M.); (T.H.)
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, 9700 RB Groningen, The Netherlands
- Correspondence: (B.P.P.); (H.S.)
| |
Collapse
|
43
|
Giles HH, Hegde MR, Lyon E, Stanley CM, Kerr ID, Garlapow ME, Eggington JM. The Science and Art of Clinical Genetic Variant Classification and Its Impact on Test Accuracy. Annu Rev Genomics Hum Genet 2021; 22:285-307. [PMID: 33900788 DOI: 10.1146/annurev-genom-121620-082709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Clinical genetic variant classification science is a growing subspecialty of clinical genetics and genomics. The field's continued improvement is essential for the success of precision medicine in both germline (hereditary) and somatic (oncology) contexts. This review focuses on variant classification for DNA next-generation sequencing tests. We first summarize current limitations in variant discovery and definition, and then describe the current five- and four-tier classification systems outlined in dominant standards and guideline publications for germline and somatic tests, respectively. We then discuss measures of variant classification discordance and the field's bias for positive results, as well as considerations for panel size and population screening in the context of estimates of positive predictive value thatincorporate estimated variant classification imperfections. Finally, we share opinions on the current state of variant classification from some of the authors of the most widely used standards and guideline publications and from other domain experts.
Collapse
Affiliation(s)
- Hunter H Giles
- Center for Genomic Interpretation, Sandy, Utah 84092, USA; , ,
| | - Madhuri R Hegde
- PerkinElmer Genomics, Waltham, Massachusetts 02450, USA; .,Department of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Elaine Lyon
- HudsonAlpha Clinical Services Lab, Huntsville, Alabama 35806, USA;
| | - Christine M Stanley
- C2i Genomics, Cambridge, Massachusetts 02139, USA.,Variantyx, Framingham, Massachusetts 01701, USA;
| | | | | | | |
Collapse
|