1
|
Emül AA, Ergün MA, Ertürk RA, Çinal Ö, Baysan M. VCF observer: a user-friendly software tool for preliminary VCF file analysis and comparison. BMC Bioinformatics 2024; 25:290. [PMID: 39227760 PMCID: PMC11373448 DOI: 10.1186/s12859-024-05860-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 07/10/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Advancements over the past decade in DNA sequencing technology and computing power have created the potential to revolutionize medicine. There has been a marked increase in genetic data available, allowing for the advancement of areas such as personalized medicine. A crucial type of data in this context is genetic variant data which is stored in variant call format (VCF) files. However, the rapid growth in genomics has presented challenges in analyzing and comparing VCF files. RESULTS In response to the limitations of existing tools, this paper introduces a novel web application that provides a user-friendly solution for VCF file analyses and comparisons. The software tool enables researchers and clinicians to perform high-level analysis with ease and enhances productivity. The application's interface allows users to conveniently upload, analyze, and visualize their VCF files using simple drag-and-drop and point-and-click operations. Essential visualizations such as Venn diagrams, clustergrams, and precision-recall plots are provided to users. A key feature of the application is its support for metadata-based file grouping, accomplished through flexible data matrix uploads, streamlining organization and analysis of user-defined categories. Additionally, the application facilitates standardized benchmarking of VCF files by integrating user-provided ground truth regions and variant lists. CONCLUSIONS By providing a user-friendly interface and supporting essential visualizations, this software enhances the accessibility of VCF file analysis and assists researchers and clinicians in their scientific inquiries.
Collapse
Affiliation(s)
- Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
- Health Institutes of Türkiye, Istanbul, Turkey
| | - Mehmet Arif Ergün
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
- Health Institutes of Türkiye, Istanbul, Turkey
| | | | - Ömer Çinal
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey.
- Health Institutes of Türkiye, Istanbul, Turkey.
| |
Collapse
|
2
|
Baykal PI, Łabaj PP, Markowetz F, Schriml LM, Stekhoven DJ, Mangul S, Beerenwinkel N. Genomic reproducibility in the bioinformatics era. Genome Biol 2024; 25:213. [PMID: 39123217 PMCID: PMC11312195 DOI: 10.1186/s13059-024-03343-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 07/23/2024] [Indexed: 08/12/2024] Open
Abstract
In biomedical research, validating a scientific discovery hinges on the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility remain imprecise. We argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent results across technical replicates, is essential for advancing scientific knowledge and medical applications. Initially, we examine different interpretations of reproducibility in genomics to clarify terms. Subsequently, we discuss the impact of bioinformatics tools on genomic reproducibility and explore methods for evaluating these tools regarding their effectiveness in ensuring genomic reproducibility. Finally, we recommend best practices to improve genomic reproducibility.
Collapse
Affiliation(s)
- Pelin Icer Baykal
- Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
| | - Paweł Piotr Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, 30-387, Gronostajowa 7A, Krakow, Poland
- Department of Biotechnology, Boku University Vienna, Muthgasse 18, 1190, Vienna, Austria
| | - Florian Markowetz
- Cancer Research UK Cambridge Research Institute, Cambridge, CB2 0RE, UK
- Department of Oncology, University of Cambridge, Cambridge, CB2 2XZ, UK
| | - Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, HSFIII, 670 W. Baltimore St, Baltimore, MD, 21201, USA
| | - Daniel J Stekhoven
- SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
- NEXUS Personalized Health Technologies, ETH Zurich, 8952, Zurich, Switzerland
| | - Serghei Mangul
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA, 90033, USA.
- Department of Quantitative and Computational Biology, University of Southern California Dornsife College of Letters, Arts, and Sciences, Los Angeles, CA, 90089, USA.
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland.
| |
Collapse
|
3
|
Liu X, Pang Y, Shan J, Wang Y, Zheng Y, Xue Y, Zhou X, Wang W, Sun Y, Yan X, Shi J, Wang X, Gu H, Zhang F. Beyond the base pairs: comparative genome-wide DNA methylation profiling across sequencing technologies. Brief Bioinform 2024; 25:bbae440. [PMID: 39256199 PMCID: PMC11387064 DOI: 10.1093/bib/bbae440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/28/2024] [Accepted: 08/21/2024] [Indexed: 09/12/2024] Open
Abstract
Deoxyribonucleic acid (DNA) methylation plays a key role in gene regulation and is critical for development and human disease. Techniques such as whole-genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) allow DNA methylation analysis at the genome scale, with Illumina NovaSeq 6000 and MGI Tech DNBSEQ-T7 being popular due to their efficiency and affordability. However, detailed comparative studies of their performance are not available. In this study, we constructed 60 WGBS and RRBS libraries for two platforms using different types of clinical samples and generated approximately 2.8 terabases of sequencing data. We systematically compared quality control metrics, genomic coverage, CpG methylation levels, intra- and interplatform correlations, and performance in detecting differentially methylated positions. Our results revealed that the DNBSEQ platform exhibited better raw read quality, although base quality recalibration indicated potential overestimation of base quality. The DNBSEQ platform also showed lower sequencing depth and less coverage uniformity in GC-rich regions than did the NovaSeq platform and tended to enrich methylated regions. Overall, both platforms demonstrated robust intra- and interplatform reproducibility for RRBS and WGBS, with NovaSeq performing better for WGBS, highlighting the importance of considering these factors when selecting a platform for bisulfite sequencing.
Collapse
Affiliation(s)
- Xin Liu
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui Province 230031, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, Anhui Province 230031, China
| | - Yu Pang
- Department of Bacteriology and Immunology, Beijing Chest Hospital, Capital Medical University/Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing 101149, China
| | - Junqi Shan
- Department of Gastrointestinal Surgery, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong 250117, China
| | - Yunfei Wang
- Hangzhou ShengTing Biotech Co. Ltd, Hangzhou, Zhejiang Province 310018, China
| | - Yanhua Zheng
- Department of Hematology, The First Hospital of China Medical University, Shenyang, Liaoning, Shenyang, Liaoning province 110001, China
| | - Yuhang Xue
- Department of Hematology, The First Hospital of China Medical University, Shenyang, Liaoning, Shenyang, Liaoning province 110001, China
| | - Xuerong Zhou
- Department of Hematology, The First Hospital of China Medical University, Shenyang, Liaoning, Shenyang, Liaoning province 110001, China
| | - Wenjun Wang
- Hangzhou ShengTing Biotech Co. Ltd, Hangzhou, Zhejiang Province 310018, China
| | - Yanlai Sun
- Department of Gastrointestinal Surgery, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong 250117, China
| | - Xiaojing Yan
- Department of Hematology, The First Hospital of China Medical University, Shenyang, Liaoning, Shenyang, Liaoning province 110001, China
| | - Jiantao Shi
- State Key Laboratory of Molecular Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaoxue Wang
- Department of Hematology, The First Hospital of China Medical University, Shenyang, Liaoning, Shenyang, Liaoning province 110001, China
| | - Hongcang Gu
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui Province 230031, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, Anhui Province 230031, China
| | - Fan Zhang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui Province 230031, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, Anhui Province 230031, China
| |
Collapse
|
4
|
Dong F, Guo W, Liu J, Patterson TA, Hong H. BERT-based language model for accurate drug adverse event extraction from social media: implementation, evaluation, and contributions to pharmacovigilance practices. Front Public Health 2024; 12:1392180. [PMID: 38716250 PMCID: PMC11074401 DOI: 10.3389/fpubh.2024.1392180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 04/11/2024] [Indexed: 05/18/2024] Open
Abstract
Introduction Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain. Method Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection. Result The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. Discussion This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.
Collapse
Affiliation(s)
| | | | | | | | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
5
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
6
|
Connor R, Shakya M, Yarmosh DA, Maier W, Martin R, Bradford R, Brister JR, Chain PSG, Copeland CA, di Iulio J, Hu B, Ebert P, Gunti J, Jin Y, Katz KS, Kochergin A, LaRosa T, Li J, Li PE, Lo CC, Rashid S, Maiorova ES, Xiao C, Zalunin V, Purcell L, Pruitt KD. Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows. Viruses 2024; 16:430. [PMID: 38543795 PMCID: PMC10975397 DOI: 10.3390/v16030430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/12/2024] [Accepted: 02/16/2024] [Indexed: 04/01/2024] Open
Abstract
Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - David A. Yarmosh
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
- BEI Resources, Manassas, VA 20110, USA
| | - Wolfgang Maier
- Galaxy Europe Team, University of Freiburg, 79085 Freiburg, Germany;
| | - Ross Martin
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Rebecca Bradford
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
- BEI Resources, Manassas, VA 20110, USA
| | - J. Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Patrick S. G. Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | | | - Julia di Iulio
- Vir Biotechnology Inc., San Francisco, CA 94158, USA; (J.d.I.); (L.P.)
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Philip Ebert
- Eli Lilly and Company, Indianapolis, IN 46225, USA;
| | - Jonathan Gunti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Yumi Jin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Kenneth S. Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Andrey Kochergin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Tré LaRosa
- Deloitte Consulting LLP, Rosslyn, VA 22209, USA; (C.A.C.); (T.L.)
| | - Jiani Li
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Po-E Li
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Sujatha Rashid
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
| | - Evguenia S. Maiorova
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Lisa Purcell
- Vir Biotechnology Inc., San Francisco, CA 94158, USA; (J.d.I.); (L.P.)
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| |
Collapse
|
7
|
Seither K, Thompson W, Suhrie K. A Practical Guide to Whole Genome Sequencing in the NICU. Neoreviews 2024; 25:e139-e150. [PMID: 38425198 DOI: 10.1542/neo.25-3-e139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
The neonatal period is a peak time for the presentation of genetic disorders that can be diagnosed using whole genome sequencing (WGS). While any one genetic disorder is individually rare, they collectively contribute to significant morbidity, mortality, and health-care costs. As the cost of WGS continues to decline and becomes increasingly available, the ordering of rapid WGS for NICU patients with signs or symptoms of an underlying genetic condition is now feasible. However, many neonatal clinicians are not comfortable with the testing, and unfortunately, there is a dearth of geneticists to facilitate testing for every patient that needs it. Here, we will review the science behind WGS, diagnostic capabilities, limitations of testing, time to consider testing, test initiation, interpretation of results, developing a plan of care that incorporates genomic information, and returning WGS results to families.
Collapse
Affiliation(s)
- Katelyn Seither
- Division of Neonatal and Pulmonary Biology, Cincinnati Children's Hospital Medical Center, and the Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH
| | - Whitney Thompson
- Division of Neonatal Medicine, and the Department of Clinical Genomics, Mayo Clinic, Rochester, MN
| | - Kristen Suhrie
- Division of Neonatology, Department of Pediatrics, and Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN
| |
Collapse
|
8
|
Raney BJ, Barber GP, Benet-Pagès A, Casper J, Clawson H, Cline M, Diekhans M, Fischer C, Navarro Gonzalez J, Hickey G, Hinrichs A, Kuhn R, Lee B, Lee C, Le Mercier P, Miga K, Nassar L, Nejad P, Paten B, Perez G, Schmelter D, Speir M, Wick B, Zweig A, Haussler D, Kent W, Haeussler M. The UCSC Genome Browser database: 2024 update. Nucleic Acids Res 2024; 52:D1082-D1088. [PMID: 37953330 PMCID: PMC10767968 DOI: 10.1093/nar/gkad987] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/06/2023] [Accepted: 10/17/2023] [Indexed: 11/14/2023] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide. It provides annotation data on thousands of genome assemblies, ranging from human to SARS-CoV2. This year, we have introduced new data from the Human Pangenome Reference Consortium and on viral genomes including SARS-CoV2. We have added 1,200 new genomes to our GenArk genome system, increasing the overall diversity of our genomic representation. We have added support for nine new user-contributed track hubs to our public hub system. Additionally, we have released 29 new tracks on the human genome and 11 new tracks on the mouse genome. Collectively, these new features expand both the breadth and depth of the genomic knowledge that we share publicly with users worldwide.
Collapse
Affiliation(s)
- Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anna Benet-Pagès
- Institute of Neurogenomics, Helmholtz Zentrum München GmbH - German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Medical Genetics Center (Medizinisch Genetisches Zentrum), Munich 80335, Germany
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Melissa S Cline
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Clayton Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Glenn Hickey
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Phillipe Le Mercier
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Karen H Miga
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brittney D Wick
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
9
|
Feng B, Lai J, Fan X, Liu Y, Wang M, Wu P, Zhou Z, Yan Q, Sun L. Systematic comparison of variant calling pipelines of target genome sequencing cross multiple next-generation sequencers. Front Genet 2024; 14:1293974. [PMID: 38239851 PMCID: PMC10794554 DOI: 10.3389/fgene.2023.1293974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 12/14/2023] [Indexed: 01/22/2024] Open
Abstract
Targeted genomic sequencing (TS) greatly benefits precision oncology by rapidly detecting genetic variations with better accuracy and sensitivity owing to its high sequencing depth. Multiple sequencing platforms and variant calling tools are available for TS, making it excruciating for researchers to choose. Therefore, benchmarking study across different platforms and pipelines available for TS is imperative. In this study, we performed a TS of Reference OncoSpan FFPE (HD832) sample enriched by TSO500 panel using four commercially available sequencers, and analyzed the output 50 datasets using five commonly-used bioinformatics pipelines. We systematically investigated the sequencing quality and variant detection sensitivity, expecting to provide optimal recommendations for future research. Four sequencing platforms returned highly concordant results in terms of base quality (Q20 > 94%), sequencing coverage (>97%) and depth (>2000×). Benchmarking revealed good concordance of variant calling across different platforms and pipelines, among which, FASTASeq 300 platform showed the highest sensitivity (100%) and precision (100%) in high-confidence variants calling when analyzed by SNVer and VarScan 2 algorithms. Furthermore, this sequencer demonstrated the shortest sequencing time (∼21 h) at the sequencing mode PE150. Through the intersection of 50 datasets generated in this study, we recommended a novel set of variant genes outside the truth set published by HD832, expecting to replenish HD832 for future research on tumor variant diagnosis. Besides, we applied these five tools to another panel (TargetSeq One) for Twist cfDNA Pan-cancer Reference Standard, comprehensive consideration of SNP and InDel sensitivity, SNVer and VarScan 2 performed best among them. Furthermore, SNVer and VarScan 2 also performed best for six cancer cell lines samples regarding SNP and InDel sensitivity. Considering the dissimilarity of variant calls across different pipelines for datasets from the same platform, we recommended an integration of multiple tools to improve variant calling sensitivity and accuracy for the cancer genome. Illumina and GeneMind technologies can be used independently or together by public health laboratories performing tumor TS. SNVer and VarScan 2 perform better regarding variant detection sensitivity for three typical tumor samples. Our study provides a standardized target sequencing resource to benchmark new bioinformatics protocols and sequencing platforms.
Collapse
Affiliation(s)
- Baosheng Feng
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Juan Lai
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Xue Fan
- Clinical Research Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yongfeng Liu
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Miao Wang
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Ping Wu
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Zhiliang Zhou
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Qin Yan
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Lei Sun
- GeneMind Biosciences Company Limited, Shenzhen, China
| |
Collapse
|
10
|
Wong WM, Tham YC, Simunovic MP, Chen FK, Luu CD, Chen H, Jin ZB, Shen RJ, Li S, Sui R, Zhao C, Yang L, Bhende M, Raman R, Sen P, Ghosh A, Poornachandra B, Sasongko MB, Arianti A, Chia V, Mangunsong CO, Manurung F, Fujinami K, Ikeda H, Woo SJ, Kim SJ, Mohd Khialdin S, Othman O, Bastion MLC, Kamalden AT, Lott PWP, Fong K, Shunmugam M, Lim A, Thapa R, Pradhan E, Rajkarnikar SP, Adhikari S, Ibañez BMBI, Koh A, Chan CMM, Fenner BJ, Tan TE, Laude A, Ngo WK, Holder GE, Su X, Chen TC, Wang NK, Kang EYC, Huang CH, Surawatsatien N, Pisuchpen P, Sujirakul T, Kumaramanickavel G, Singh M, Leroy B, Michaelides M, Cheng CY, Chen LJ, Chan HW. Rationale and protocol paper for the Asia Pacific Network for inherited eye diseases. Asia Pac J Ophthalmol (Phila) 2024; 13:100030. [PMID: 38233300 DOI: 10.1016/j.apjo.2023.100030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/16/2023] [Accepted: 11/21/2023] [Indexed: 01/19/2024] Open
Abstract
PURPOSE There are major gaps in our knowledge of hereditary ocular conditions in the Asia-Pacific population, which comprises approximately 60% of the world's population. Therefore, a concerted regional effort is urgently needed to close this critical knowledge gap and apply precision medicine technology to improve the quality of lives of these patients in the Asia-Pacific region. DESIGN Multi-national, multi-center collaborative network. METHODS The Research Standing Committee of the Asia-Pacific Academy of Ophthalmology and the Asia-Pacific Society of Eye Genetics fostered this research collaboration, which brings together renowned institutions and experts for inherited eye diseases in the Asia-Pacific region. The immediate priority of the network will be inherited retinal diseases (IRDs), where there is a lack of detailed characterization of these conditions and in the number of established registries. RESULTS The network comprises 55 members from 35 centers, spanning 12 countries and regions, including Australia, China, India, Indonesia, Japan, South Korea, Malaysia, Nepal, Philippines, Singapore, Taiwan, and Thailand. The steering committee comprises ophthalmologists with experience in consortia for eye diseases in the Asia-Pacific region, leading ophthalmologists and vision scientists in the field of IRDs internationally, and ophthalmic geneticists. CONCLUSIONS The Asia Pacific Inherited Eye Disease (APIED) network aims to (1) improve genotyping capabilities and expertise to increase early and accurate genetic diagnosis of IRDs, (2) harmonise deep phenotyping practices and utilization of ontological terms, and (3) establish high-quality, multi-user, federated disease registries that will facilitate patient care, genetic counseling, and research of IRDs regionally and internationally.
Collapse
Affiliation(s)
- Wendy M Wong
- Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Ophthalmology, National University Hospital, National University Health System, Singapore
| | - Yih Chung Tham
- Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Matthew P Simunovic
- Save Sight Institute, The University of Sydney, Sydney, Australia; Retinal Unit, Sydney Eye Hospital, Sydney, Australia
| | - Fred Kuanfu Chen
- Centre for Ophthalmology and Visual Science (Lions Eye Institute), The University of Western Australia, Nedlands, Western Australia, Australia; Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Victoria, Australia
| | - Chi D Luu
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Victoria, Australia; Ophthalmology, Department of Surgery, University of Melbourne, East Melbourne, Victoria, Australia
| | - Haoyu Chen
- Joint Shantou International Eye Center, Shantou University & The Chinese University of Hong Kong, Shantou, China
| | - Zi-Bing Jin
- Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing Ophthalmology & Visual Sciences Key Laboratory, Beijing, China
| | - Ren-Juan Shen
- Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing Ophthalmology & Visual Sciences Key Laboratory, Beijing, China
| | - Shiying Li
- Department of Ophthalmology in Xiang'an Hospital of Xiamen University and Medical Center of Xiamen University, School of Medicine in Xiamen University, Eye Institute of Xiamen University, Fujian Provincial Key Laboratory of Ophthalmology and Visual Science, Fujian Engineering and Research Center of Eye Regenerative Medicine, Xiamen, Fujian, China
| | - Ruifang Sui
- Department of Ophthalmology, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, No. 1, Shuai Fu Yuan, Beijing, China
| | - Chen Zhao
- Eye Institute and Department of Ophthalmology, Eye & ENT Hospital, Fudan University, Shanghai, China
| | - Liping Yang
- Department of Ophthalmology, Peking University Third Hospital, Beijing, China; Beijing Key Laboratory of Restoration of Damaged Ocular Nerve, Peking University Third Hospital, Beijing, China
| | - Muna Bhende
- Shri Bhagwan Mahavir Vitreoretinal services, Medical Research Foundation, Sankara Nethralaya, Chennai, India
| | - Rajiv Raman
- Shri Bhagwan Mahavir Vitreoretinal services, Medical Research Foundation, Sankara Nethralaya, Chennai, India
| | - Parveen Sen
- Shri Bhagwan Mahavir Vitreoretinal services, Medical Research Foundation, Sankara Nethralaya, Chennai, India; Dr Agarwal Eye Hospital, Chandigarh, India
| | - Arkasubhra Ghosh
- GROW Lab, Narayana Nethralaya Foundation, Bangalore, Karnataka, India
| | - B Poornachandra
- Vitreo-Retina Services, Narayana Nethralaya, Bangalore, India
| | - Muhammad Bayu Sasongko
- Department of Ophthalmology, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada - Sardjito Eye Center, Dr. Sardjito General Hospital, Yogyakarta, Indonesia
| | - Alia Arianti
- JEC Eye Hospitals and Clinics, Jakarta, Indonesia
| | - Valen Chia
- JEC Eye Hospitals and Clinics, Jakarta, Indonesia
| | | | | | - Kaoru Fujinami
- Laboratory of Visual Physiology, Division of Vision Research, National Institute of Sensory Organs, NHO Tokyo Medical Center, Tokyo, Japan
| | - Hanako Ikeda
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Sakyo-ku, Kyoto, Japan
| | - Se Joon Woo
- Department of Ophthalmology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sang Jin Kim
- Department of Ophthalmology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea
| | - Safinaz Mohd Khialdin
- Department of Ophthalmology, Faculty of Medicine, Universiti Kebangsaan Malaysia Medical Center, Kuala Lumpur, Malaysia; UKM Specialist Children's Hospital, Kuala Lumpur, Malaysia
| | - Othmaliza Othman
- Department of Ophthalmology, Faculty of Medicine, Universiti Kebangsaan Malaysia Medical Center, Kuala Lumpur, Malaysia
| | - Mae-Lynn Catherine Bastion
- Department of Ophthalmology, Faculty of Medicine, Universiti Kebangsaan Malaysia Medical Center, Kuala Lumpur, Malaysia; Hospital Canselor Tuanku Muhriz, Jalan Yaacob Latif, Bandar Tun Razak, Kuala Lumpur, Malaysia
| | - Ain Tengku Kamalden
- UM Eye Research Centre, Department of Ophthalmology, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Pooi Wah Penny Lott
- UM Eye Research Centre, Department of Ophthalmology, Universiti Malaya, Kuala Lumpur, Malaysia
| | | | | | - Amelia Lim
- Ophthalmology, Penang Gleneagles, Malaysia
| | - Raba Thapa
- Tilganga Institute of Ophthalmology, Kathmandu, Nepal
| | - Eli Pradhan
- Tilganga Institute of Ophthalmology, Kathmandu, Nepal
| | | | | | - B Manuel Benjamin Iv Ibañez
- Makati Medical Center, Makati City, Philippines; DOH Eye Center, East Avenue Medical Center, Quezon City, Philippines
| | - Adrian Koh
- Eye & Retina Surgeons, Camden Medical Centre, Singapore, Singapore
| | - Choi Mun M Chan
- Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Ophthalmology & Visual Sciences Academic Clinical Program (EYE ACP), Duke-NUS Medical School, Singapore
| | - Beau J Fenner
- Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Ophthalmology & Visual Sciences Academic Clinical Program (EYE ACP), Duke-NUS Medical School, Singapore
| | - Tien-En Tan
- Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Ophthalmology & Visual Sciences Academic Clinical Program (EYE ACP), Duke-NUS Medical School, Singapore
| | - Augustinus Laude
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Wei Kiong Ngo
- National Healthcare Group Eye Institute, Tan Tock Seng Hospital, Singapore, Singapore
| | - Graham E Holder
- Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Ophthalmology, National University Hospital, National University Health System, Singapore
| | - Xinyi Su
- Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Ophthalmology, National University Hospital, National University Health System, Singapore
| | - Ta-Ching Chen
- Department of Ophthalmology, National Taiwan University Hospital, Taipei, Taiwan; Center of Frontier Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Nan-Kai Wang
- Department of Ophthalmology, Edward S. Harkness Eye Institute, Columbia University, New York, NY, USA; Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou Medical Center, Taoyuan, Taiwan
| | - Eugene Yu-Chuan Kang
- Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou Medical Center, Taoyuan, Taiwan; Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chu-Hsuan Huang
- Department of Ophthalmology, Cathay General Hospital, Taipei, Taiwan
| | - Nuntachai Surawatsatien
- Center of Excellence in Retina, Department of Ophthalmology, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Bangkok, Thailand
| | - Phattrawan Pisuchpen
- Department of Ophthalmology and Division of Academic Affairs, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Tharikarn Sujirakul
- Department of Ophthalmology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | | | - Mandeep Singh
- Wilmer Eye Institute, Johns Hopkins Hospital, Baltimore, MD 21287, USA
| | - Bart Leroy
- Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
| | - Michel Michaelides
- Moorfields Eye Hospital, London, United Kingdom and UCL Institute of Ophthalmology, University College London, London, United Kingdom
| | - Ching-Yu Cheng
- Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Singapore National Eye Centre, Singapore Eye Research Institute, Singapore
| | - Li Jia Chen
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Hwei Wuen Chan
- Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Ophthalmology, National University Hospital, National University Health System, Singapore; Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
11
|
Jia P, Dong L, Yang X, Wang B, Bush SJ, Wang T, Lin J, Wang S, Zhao X, Xu T, Che Y, Dang N, Ren L, Zhang Y, Wang X, Liang F, Wang Y, Ruan J, Xia H, Zheng Y, Shi L, Lv Y, Wang J, Ye K. Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. Genome Biol 2023; 24:277. [PMID: 38049885 PMCID: PMC10694985 DOI: 10.1186/s13059-023-03116-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/21/2023] [Indexed: 12/06/2023] Open
Abstract
BACKGROUND Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
Collapse
Affiliation(s)
- Peng Jia
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Lianhua Dong
- National Institute of Metrology, Beijing, 100029, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tingjie Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xixi Zhao
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yizhuo Che
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Ningxin Dang
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yujing Zhang
- National Institute of Metrology, Beijing, 100029, China
| | - Xia Wang
- National Institute of Metrology, Beijing, 100029, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Yang Wang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Han Xia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yi Lv
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
| | - Jing Wang
- National Institute of Metrology, Beijing, 100029, China.
| | - Kai Ye
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
- Faculty of Science, Leiden University, Leiden, 2311EZ, The Netherlands.
| |
Collapse
|
12
|
Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, Peng R, Hou W, Liu Y, Li J, Yu Y, Zhang N, Shang J, Liang F, Wang D, Chen H, Sun L, Hao L, Scherer A, Nordlund J, Xiao W, Xu J, Tong W, Hu X, Jia P, Ye K, Li J, Jin L, Hong H, Wang J, Fan S, Fang X, Zheng Y, Shi L. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol 2023; 24:270. [PMID: 38012772 PMCID: PMC10680274 DOI: 10.1186/s13059-023-03109-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome. RESULTS We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data. CONCLUSIONS The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoke Duan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Fan Liang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, Hubei, China
| | - Hui Chen
- OrigiMed Co., Ltd, Shanghai, China
| | - Lele Sun
- Sequanta Technologies Co., Ltd, Shanghai, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jessica Nordlund
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Xin Hu
- Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peng Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
- Shanghai Cancer Center, Fudan University, Shanghai, China
- International Human Phenome Institutes, Shanghai, China
| |
Collapse
|
13
|
Yang J, Liu Y, Shang J, Chen Q, Chen Q, Ren L, Zhang N, Yu Y, Li Z, Song Y, Yang S, Scherer A, Tong W, Hong H, Xiao W, Shi L, Zheng Y. The Quartet Data Portal: integration of community-wide resources for multiomics quality control. Genome Biol 2023; 24:245. [PMID: 37884999 PMCID: PMC10601216 DOI: 10.1186/s13059-023-03091-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The Quartet Data Portal facilitates community access to well-characterized reference materials, reference datasets, and related resources established based on a family of four individuals with identical twins from the Quartet Project. Users can request DNA, RNA, protein, and metabolite reference materials, as well as datasets generated across omics, platforms, labs, protocols, and batches. Reproducible analysis tools allow for objective performance assessment of user-submitted data, while interactive visualization tools support rapid exploration of reference datasets. A closed-loop "distribution-collection-evaluation-integration" workflow enables updates and integration of community-contributed multiomics data. Ultimately, this portal helps promote the advancement of reference datasets and multiomics quality control.
Collapse
Affiliation(s)
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yueqiang Song
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shengpeng Yang
- Intelligent Storage, Alibaba Cloud, Alibaba Group, Hangzhou, Zhejiang, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
14
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
15
|
Everman ER, Macdonald SJ, Kelly JK. The genetic basis of adaptation to copper pollution in Drosophila melanogaster. Front Genet 2023; 14:1144221. [PMID: 37082199 PMCID: PMC10110907 DOI: 10.3389/fgene.2023.1144221] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 03/21/2023] [Indexed: 04/22/2023] Open
Abstract
Introduction: Heavy metal pollutants can have long lasting negative impacts on ecosystem health and can shape the evolution of species. The persistent and ubiquitous nature of heavy metal pollution provides an opportunity to characterize the genetic mechanisms that contribute to metal resistance in natural populations. Methods: We examined variation in resistance to copper, a common heavy metal contaminant, using wild collections of the model organism Drosophila melanogaster. Flies were collected from multiple sites that varied in copper contamination risk. We characterized phenotypic variation in copper resistance within and among populations using bulked segregant analysis to identify regions of the genome that contribute to copper resistance. Results and Discussion: Copper resistance varied among wild populations with a clear correspondence between resistance level and historical exposure to copper. We identified 288 SNPs distributed across the genome associated with copper resistance. Many SNPs had population-specific effects, but some had consistent effects on copper resistance in all populations. Significant SNPs map to several novel candidate genes involved in refolding disrupted proteins, energy production, and mitochondrial function. We also identified one SNP with consistent effects on copper resistance in all populations near CG11825, a gene involved in copper homeostasis and copper resistance. We compared the genetic signatures of copper resistance in the wild-derived populations to genetic control of copper resistance in the Drosophila Synthetic Population Resource (DSPR) and the Drosophila Genetic Reference Panel (DGRP), two copper-naïve laboratory populations. In addition to CG11825, which was identified as a candidate gene in the wild-derived populations and previously in the DSPR, there was modest overlap of copper-associated SNPs between the wild-derived populations and laboratory populations. Thirty-one SNPs associated with copper resistance in wild-derived populations fell within regions of the genome that were associated with copper resistance in the DSPR in a prior study. Collectively, our results demonstrate that the genetic control of copper resistance is highly polygenic, and that several loci can be clearly linked to genes involved in heavy metal toxicity response. The mixture of parallel and population-specific SNPs points to a complex interplay between genetic background and the selection regime that modifies the effects of genetic variation on copper resistance.
Collapse
Affiliation(s)
| | - Stuart J. Macdonald
- Molecular Biosciences, University of Kansas, Lawrence, KS, United States
- Center for Computational Biology, University of Kansas, Lawrence, KS, United States
| | - John K. Kelly
- Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, United States
| |
Collapse
|
16
|
Zhai Y, Bardel C, Vallée M, Iwaz J, Roy P. Performance comparisons between clustering models for reconstructing NGS results from technical replicates. Front Genet 2023; 14:1148147. [PMID: 37007945 PMCID: PMC10060969 DOI: 10.3389/fgene.2023.1148147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 03/06/2023] [Indexed: 03/18/2023] Open
Abstract
To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.
Collapse
Affiliation(s)
- Yue Zhai
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- *Correspondence: Yue Zhai,
| | - Claire Bardel
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
- Service de Génétique, Hospices Civils de Lyon, Bron, France
| | - Maxime Vallée
- Cellule Bioinformatique de La Plateforme de Séquençage Haut Débit NGS-HCL, Hospices Civils de Lyon, Bron, France
| | - Jean Iwaz
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| | - Pascal Roy
- Université Lyon 1, Lyon, France
- Université de Lyon, Lyon, France
- Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Service de Biostatistique-Bioinformatique, Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
17
|
Connor R, Yarmosh DA, Maier W, Shakya M, Martin R, Bradford R, Brister JR, Chain PS, Copeland CA, di Iulio J, Hu B, Ebert P, Gunti J, Jin Y, Katz KS, Kochergin A, LaRosa T, Li J, Li PE, Lo CC, Rashid S, Maiorova ES, Xiao C, Zalunin V, Pruitt KD. Towards increased accuracy and reproducibility in SARS-CoV-2 next generation sequence analysis for public health surveillance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.11.03.515010. [PMID: 36380755 PMCID: PMC9645426 DOI: 10.1101/2022.11.03.515010] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
During the COVID-19 pandemic, SARS-CoV-2 surveillance efforts integrated genome sequencing of clinical samples to identify emergent viral variants and to support rapid experimental examination of genome-informed vaccine and therapeutic designs. Given the broad range of methods applied to generate new viral genomes, it is critical that consensus and variant calling tools yield consistent results across disparate pipelines. Here we examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARS-CoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative, a public-private partnership established to address the COVID-19 outbreak. Our results indicate that bioinformatic workflows can yield consensus genomes with different single nucleotide polymorphisms, insertions, and/or deletions even when using the same raw sequence input datasets. We introduce the use of a specific suite of parameters and protocols that greatly improves the agreement among pipelines developed by diverse organizations. Such consistency among bioinformatic pipelines is fundamental to SARS-CoV-2 and future pathogen surveillance efforts. The application of analysis standards is necessary to more accurately document phylogenomic trends and support data-driven public health responses.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - David A Yarmosh
- American Type Culture Collection, 10807 University Blvd, Manassas, VA 20110, USA
- BEI Resources
| | - Wolfgang Maier
- Galaxy Europe Team, University of Freiburg, Freiburg, Germany
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Ross Martin
- Clinical Virology Department, Gilead Sciences, 333 Lakeside Dr, Foster City, CA 94404, USA
| | - Rebecca Bradford
- American Type Culture Collection, 10807 University Blvd, Manassas, VA 20110, USA
- BEI Resources
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Patrick Sg Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Courtney A Copeland
- Deloitte Consulting LLP, 1919 North Lynn St, Suite 1500, Rosslyn, VA 22209 USA
| | | | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | | | - Jonathan Gunti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yumi Jin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Andrey Kochergin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tré LaRosa
- Deloitte Consulting LLP, 1919 North Lynn St, Suite 1500, Rosslyn, VA 22209 USA
| | - Jiani Li
- Clinical Virology Department, Gilead Sciences, 333 Lakeside Dr, Foster City, CA 94404, USA
| | - Po-E Li
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545 USA
| | - Sujatha Rashid
- American Type Culture Collection, 10807 University Blvd, Manassas, VA 20110, USA
| | - Evguenia S Maiorova
- Clinical Virology Department, Gilead Sciences, 333 Lakeside Dr, Foster City, CA 94404, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
18
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK
- University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia
- Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
19
|
Khayat MM, Sahraeian SME, Zarate S, Carroll A, Hong H, Pan B, Shi L, Gibbs RA, Mohiyuddin M, Zheng Y, Sedlazeck FJ. Hidden biases in germline structural variant detection. Genome Biol 2021; 22:347. [PMID: 34930391 PMCID: PMC8686633 DOI: 10.1186/s13059-021-02558-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/24/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. RESULTS In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. CONCLUSIONS This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.
Collapse
Affiliation(s)
- Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Huixiao Hong
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Bohu Pan
- National Center for Toxicological Research, Food and Drug Administration, Jefferson, AR, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
- Institute of Thoracic Oncology, Fudan University, Shanghai, China
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
- Institute of Thoracic Oncology, Fudan University, Shanghai, China.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|