1
|
Yu Y, Hou W, Liu Y, Wang H, Dong L, Mai Y, Chen Q, Li Z, Sun S, Yang J, Cao Z, Zhang P, Zi Y, Liu R, Gao J, Zhang N, Li J, Ren L, Jiang H, Shang J, Zhu S, Wang X, Qing T, Bao D, Li B, Li B, Suo C, Pi Y, Wang X, Dai F, Scherer A, Mattila P, Han J, Zhang L, Jiang H, Thierry-Mieg D, Thierry-Mieg J, Xiao W, Hong H, Tong W, Wang J, Li J, Fang X, Jin L, Xu J, Qian F, Zhang R, Shi L, Zheng Y. Quartet RNA reference materials improve the quality of transcriptomic data through ratio-based profiling. Nat Biotechnol 2024; 42:1118-1132. [PMID: 37679545 PMCID: PMC11251996 DOI: 10.1038/s41587-023-01867-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 06/15/2023] [Indexed: 09/09/2023]
Abstract
Certified RNA reference materials are indispensable for assessing the reliability of RNA sequencing to detect intrinsically small biological differences in clinical settings, such as molecular subtyping of diseases. As part of the Quartet Project for quality control and data integration of multi-omics profiling, we established four RNA reference materials derived from immortalized B-lymphoblastoid cell lines from four members of a monozygotic twin family. Additionally, we constructed ratio-based transcriptome-wide reference datasets between two samples, providing cross-platform and cross-laboratory 'ground truth'. Investigation of the intrinsically subtle biological differences among the Quartet samples enables sensitive assessment of cross-batch integration of transcriptomic measurements at the ratio level. The Quartet RNA reference materials, combined with the ratio-based reference datasets, can serve as unique resources for assessing and improving the quality of transcriptomic data in clinical and biological settings.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yi Zi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jian Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Chen Suo
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yan Pi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xia Wang
- National Institute of Metrology, Beijing, China
| | | | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, The Netherlands
| | - Pirkko Mattila
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, The Netherlands
| | | | - Lijun Zhang
- Nanjing Vazyme Biotech Co. Ltd., Nanjing, China
| | | | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
- National Center of Gerontology, Beijing, China
| | - Xiang Fang
- National Institute of Metrology, Beijing, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA.
| | - Feng Qian
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
- National Center of Gerontology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes, Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
2
|
Zheng Y, Liu Y, Yang J, Dong L, Zhang R, Tian S, Yu Y, Ren L, Hou W, Zhu F, Mai Y, Han J, Zhang L, Jiang H, Lin L, Lou J, Li R, Lin J, Liu H, Kong Z, Wang D, Dai F, Bao D, Cao Z, Chen Q, Chen Q, Chen X, Gao Y, Jiang H, Li B, Li B, Li J, Liu R, Qing T, Shang E, Shang J, Sun S, Wang H, Wang X, Zhang N, Zhang P, Zhang R, Zhu S, Scherer A, Wang J, Wang J, Huo Y, Liu G, Cao C, Shao L, Xu J, Hong H, Xiao W, Liang X, Lu D, Jin L, Tong W, Ding C, Li J, Fang X, Shi L. Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials. Nat Biotechnol 2024; 42:1133-1149. [PMID: 37679543 PMCID: PMC11252085 DOI: 10.1038/s41587-023-01934-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 07/31/2023] [Indexed: 09/09/2023]
Abstract
Characterization and integration of the genome, epigenome, transcriptome, proteome and metabolome of different datasets is difficult owing to a lack of ground truth. Here we develop and characterize suites of publicly available multi-omics reference materials of matched DNA, RNA, protein and metabolites derived from immortalized cell lines from a family quartet of parents and monozygotic twin daughters. These references provide built-in truth defined by relationships among the family members and the information flow from DNA to RNA to protein. We demonstrate how using a ratio-based profiling approach that scales the absolute feature values of a study sample relative to those of a concurrently measured common reference sample produces reproducible and comparable data suitable for integration across batches, labs, platforms and omics types. Our study identifies reference-free 'absolute' feature quantification as the root cause of irreproducibility in multi-omics measurement and data integration and establishes the advantages of ratio-based multi-omics profiling with common reference materials.
Collapse
Affiliation(s)
- Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, China
| | | | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China
| | - Sha Tian
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Feng Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | | | | | - Ling Lin
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Jingwei Lou
- Zhangjiang Center for Translational Medicine, Shanghai Biotecan Medical Diagnostics Co. Ltd., Shanghai, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing, China
| | - Jingchao Lin
- Metabo-Profile Biotechnology (Shanghai) Co. Ltd., Shanghai, China
| | | | | | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, China
| | | | - Ding Bao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xingdong Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bin Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Bingying Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jingjing Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Nextomics Biosciences Institute, Wuhan, China
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Tao Qing
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shanyue Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Haiyan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Xiaolin Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Peipei Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Sibo Zhu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Jiucun Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Yinbo Huo
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Gang Liu
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chengming Cao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Li Shao
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Xiaozhen Liang
- Shanghai Institute of Immunity and Infection, Chinese Academy of Sciences, Shanghai, China
| | - Daru Lu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Weida Tong
- Key Laboratory of Bioanalysis and Metrology for State Market Regulation, Shanghai Institute of Measurement and Testing Technology, Shanghai, China
| | - Chen Ding
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital, Beijing, China.
| | - Xiang Fang
- National Institute of Metrology, Beijing, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
3
|
Masood D, Ren L, Nguyen C, Brundu FG, Zheng L, Zhao Y, Jaeger E, Li Y, Cha SW, Halpern A, Truong S, Virata M, Yan C, Chen Q, Pang A, Alberto R, Xiao C, Yang Z, Chen W, Wang C, Cross F, Catreux S, Shi L, Beaver JA, Xiao W, Meerzaman DM. Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome. Genome Biol 2024; 25:163. [PMID: 38902799 PMCID: PMC11188507 DOI: 10.1186/s13059-024-03294-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/29/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Copy number variation (CNV) is a key genetic characteristic for cancer diagnostics and can be used as a biomarker for the selection of therapeutic treatments. Using data sets established in our previous study, we benchmark the performance of cancer CNV calling by six most recent and commonly used software tools on their detection accuracy, sensitivity, and reproducibility. In comparison to other orthogonal methods, such as microarray and Bionano, we also explore the consistency of CNV calling across different technologies on a challenging genome. RESULTS While consistent results are observed for copy gain, loss, and loss of heterozygosity (LOH) calls across sequencing centers, CNV callers, and different technologies, variation of CNV calls are mostly affected by the determination of genome ploidy. Using consensus results from six CNV callers and confirmation from three orthogonal methods, we establish a high confident CNV call set for the reference cancer cell line (HCC1395). CONCLUSIONS NGS technologies and current bioinformatics tools can offer reliable results for detection of copy gain, loss, and LOH. However, when working with a hyper-diploid genome, some software tools can call excessive copy gain or loss due to inaccurate assessment of genome ploidy. With performance matrices on various experimental conditions, this study raises awareness within the cancer research community for the selection of sequencing platforms, sample preparation, sequencing coverage, and the choice of CNV detection tools.
Collapse
Affiliation(s)
- Daniall Masood
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Cu Nguyen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | | | - Lily Zheng
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Yong Li
- Illumina Inc., San Diego, CA, USA
| | | | | | | | | | - Chunhua Yan
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Qingrong Chen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Andy Pang
- Bionano Genomics, San Diego, CA, 20892, USA
| | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Librarssy of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Zhaowei Yang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Frank Cross
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
| | | | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Julia A Beaver
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA
- Oncology Center of Excellence, Food and Drug Administration, Silver Spring, MD, USA
| | - Wenming Xiao
- Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA.
| | - Daoud M Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA.
| |
Collapse
|
4
|
Hanssen F, Garcia MU, Folkersen L, Pedersen A, Lescai F, Jodoin S, Miller E, Seybold M, Wacker O, Smith N, Gabernet G, Nahnsen S. Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery. NAR Genom Bioinform 2024; 6:lqae031. [PMID: 38666213 PMCID: PMC11044436 DOI: 10.1093/nargab/lqae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 03/23/2024] [Indexed: 04/28/2024] Open
Abstract
DNA variation analysis has become indispensable in many aspects of modern biomedicine, most prominently in the comparison of normal and tumor samples. Thousands of samples are collected in local sequencing efforts and public databases requiring highly scalable, portable, and automated workflows for streamlined processing. Here, we present nf-core/sarek 3, a well-established, comprehensive variant calling and annotation pipeline for germline and somatic samples. It is suitable for any genome with a known reference. We present a full rewrite of the original pipeline showing a significant reduction of storage requirements by using the CRAM format and runtime by increasing intra-sample parallelization. Both are leading to a 70% cost reduction in commercial clouds enabling users to do large-scale and cross-platform data analysis while keeping costs and CO2 emissions low. The code is available at https://nf-co.re/sarek.
Collapse
Affiliation(s)
- Friederike Hanssen
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Department of Computer Science, Eberhard-Karls University of Tübingen, 72076 Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Cluster of Excellence iFIT (EXC 2180) ‘Image-Guided and Functionally Instructed Tumor Therapies’, Eberhard-Karls University of Tübingen, Tübingen 72076, Baden-Württemberg, Germany
| | - Maxime U Garcia
- Seqera Labs, Carrer de Marià Aguilò, 28, Barcelona 08005, Spain
- Barntumörbanken, Department of Oncology-Pathology, Karolinska Institutet, BioClinicum, Visionsgatan 4, Solna 17164, Sweden
- National Genomics Infrastructure, SciLifeLab, SciLifeLab, Tomtebodavägen 23, Solna 17165, Sweden
| | | | | | - Francesco Lescai
- Department of Biology and Biotechnology ”L. Spallanzani”, University of Pavia, via Ferrata, 9, Pavia, 27100 PV, Italy
| | - Susanne Jodoin
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
| | - Edmund Miller
- Department of Biological Sciences and Center for Systems Biology, University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080, USA
| | - Matthias Seybold
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
| | - Oskar Wacker
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
| | - Nicholas Smith
- Department of Informatics, Technical University of Munich, Boltzmannstr. 3, Garching, 85748 Bavaria, Germany
| | - Gisela Gabernet
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Department of Pathology, Yale School of Medicine, 300 George, New Haven, CT 06510, USA
| | - Sven Nahnsen
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Department of Computer Science, Eberhard-Karls University of Tübingen, 72076 Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Cluster of Excellence iFIT (EXC 2180) ‘Image-Guided and Functionally Instructed Tumor Therapies’, Eberhard-Karls University of Tübingen, Tübingen 72076, Baden-Württemberg, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen 72076, Baden-Württemberg, Germany
| |
Collapse
|
5
|
Schmid K, Sehring J, Németh A, Harter PN, Weber KJ, Vengadeswaran A, Storf H, Seidemann C, Karki K, Fischer P, Dohmen H, Selignow C, von Deimling A, Grau S, Schröder U, Plate KH, Stein M, Uhl E, Acker T, Amsel D. DistSNE: Distributed computing and online visualization of DNA methylation-based central nervous system tumor classification. Brain Pathol 2024; 34:e13228. [PMID: 38012085 PMCID: PMC11007060 DOI: 10.1111/bpa.13228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 11/10/2023] [Indexed: 11/29/2023] Open
Abstract
The current state-of-the-art analysis of central nervous system (CNS) tumors through DNA methylation profiling relies on the tumor classifier developed by Capper and colleagues, which centrally harnesses DNA methylation data provided by users. Here, we present a distributed-computing-based approach for CNS tumor classification that achieves a comparable performance to centralized systems while safeguarding privacy. We utilize the t-distributed neighborhood embedding (t-SNE) model for dimensionality reduction and visualization of tumor classification results in two-dimensional graphs in a distributed approach across multiple sites (DistSNE). DistSNE provides an intuitive web interface (https://gin-tsne.med.uni-giessen.de) for user-friendly local data management and federated methylome-based tumor classification calculations for multiple collaborators in a DataSHIELD environment. The freely accessible web interface supports convenient data upload, result review, and summary report generation. Importantly, increasing sample size as achieved through distributed access to additional datasets allows DistSNE to improve cluster analysis and enhance predictive power. Collectively, DistSNE enables a simple and fast classification of CNS tumors using large-scale methylation data from distributed sources, while maintaining the privacy and allowing easy and flexible network expansion to other institutes. This approach holds great potential for advancing human brain tumor classification and fostering collaborative precision medicine in neuro-oncology.
Collapse
Affiliation(s)
- Kai Schmid
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| | - Jannik Sehring
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| | - Attila Németh
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| | - Patrick N. Harter
- Neurological Institute (Edinger Institute)University Hospital FrankfurtFrankfurtGermany
- Present address:
Center for Neuropathology and Prion ResearchUniversity Hospital of MunichMunichGermany
| | - Katharina J. Weber
- Neurological Institute (Edinger Institute)University Hospital FrankfurtFrankfurtGermany
- German Cancer Consortium (DKTK)HeidelbergGermany
- German Cancer Research Center (DKFZ)HeidelbergGermany
- Frankfurt Cancer Institute (FCI)FrankfurtGermany
- University Cancer Center (UCT) FrankfurtFrankfurtGermany
| | - Abishaa Vengadeswaran
- Medical Informatics Group (MIG), Goethe University FrankfurtUniversity Hospital FrankfurtFrankfurt am MainGermany
| | - Holger Storf
- Medical Informatics Group (MIG), Goethe University FrankfurtUniversity Hospital FrankfurtFrankfurt am MainGermany
| | | | - Kapil Karki
- DIZ MarburgPhillips University MarburgMarburgGermany
| | - Patrick Fischer
- Institute for Medical InformaticsJustus‐Liebig UniversityGiessenGermany
- Department of Neuropathology, German Cancer Research Center (DKFZ)Universitätsklinikum Heidelberg, and CCU NeuropathologyHeidelbergGermany
| | - Hildegard Dohmen
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| | - Carmen Selignow
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| | | | - Stefan Grau
- Department of NeurosurgeryHospital FuldaFuldaGermany
| | - Uwe Schröder
- Department of NeurosurgeryMVZ Frankfurt/OderFrankfurtGermany
| | - Karl H. Plate
- Neurological Institute (Edinger Institute)University Hospital FrankfurtFrankfurtGermany
| | - Marco Stein
- Department of NeurosurgeryUniversity Hospital Giessen und Marburg Location GiessenGiessenGermany
| | - Eberhard Uhl
- Department of NeurosurgeryUniversity Hospital Giessen und Marburg Location GiessenGiessenGermany
| | - Till Acker
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| | - Daniel Amsel
- Institute of Neuropathology, Justus‐Liebig University GiessenGiessenGermany
| |
Collapse
|
6
|
Okojie J, O’Neal N, Burr M, Worley P, Packer I, Anderson D, Davis J, Kearns B, Fatema K, Dixon K, Barrott JJ. DNA Quantity and Quality Comparisons between Cryopreserved and FFPE Tumors from Matched Pan-Cancer Samples. Curr Oncol 2024; 31:2441-2452. [PMID: 38785464 PMCID: PMC11119490 DOI: 10.3390/curroncol31050183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 04/25/2024] [Accepted: 04/27/2024] [Indexed: 05/25/2024] Open
Abstract
Personalized cancer care requires molecular characterization of neoplasms. While the research community accepts frozen tissues as the gold standard analyte for molecular assays, the source of tissue for testing in clinical cancer care comes almost universally from formalin-fixed, paraffin-embedded tissue (FFPE). As newer technologies emerge for DNA characterization that requires higher molecular weight DNA, it was necessary to compare the quality of DNA in terms of DNA length between FFPE and cryopreserved samples. We hypothesized that cryopreserved samples would yield higher quantity and superior quality DNA compared to FFPE samples. We analyzed DNA metrics by performing a head-to-head comparison between FFPE and cryopreserved samples from 38 human tumors representing various cancer types. DNA quantity and purity were measured by UV spectrophotometry, and DNA from cryopreserved tissue demonstrated a 4.2-fold increase in DNA yield per mg of tissue (p-value < 0.001). DNA quality was measured on a fragment microelectrophoresis analyzer, and again, DNA from cryopreserved tissue demonstrated a 223% increase in the DNA quality number and a 9-fold increase in DNA fragments > 40,000 bp (p-value < 0.0001). DNA from the cryopreserved tissues was superior to the DNA from FFPE samples in terms of DNA yield and quality.
Collapse
Affiliation(s)
- Jeffrey Okojie
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
- Department of Biomedical and Pharmaceutical Sciences, Idaho State University, Pocatello, ID 83209, USA; (N.O.); (K.F.)
| | - Nikole O’Neal
- Department of Biomedical and Pharmaceutical Sciences, Idaho State University, Pocatello, ID 83209, USA; (N.O.); (K.F.)
| | - Mackenzie Burr
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
| | - Peyton Worley
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
| | - Isaac Packer
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
| | - DeLaney Anderson
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
| | - Jack Davis
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
| | - Bridger Kearns
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
| | - Kaniz Fatema
- Department of Biomedical and Pharmaceutical Sciences, Idaho State University, Pocatello, ID 83209, USA; (N.O.); (K.F.)
| | - Ken Dixon
- Specicare, 690 Medical Park Ln, Gainesville, GA 30501, USA
| | - Jared J. Barrott
- Department of Cell Biology & Physiology, Brigham Young University, Provo, UT 84602, USA; (J.O.); (M.B.); (P.W.); (I.P.); (D.A.); (J.D.); (B.K.)
- Department of Biomedical and Pharmaceutical Sciences, Idaho State University, Pocatello, ID 83209, USA; (N.O.); (K.F.)
- Specicare, 690 Medical Park Ln, Gainesville, GA 30501, USA
- Simmons Center for Cancer Research, Brigham Young University, Provo, UT 84602, USA
| |
Collapse
|
7
|
Ergun MA, Cinal O, Bakışlı B, Emül AA, Baysan M. COSAP: Comparative Sequencing Analysis Platform. BMC Bioinformatics 2024; 25:130. [PMID: 38532317 DOI: 10.1186/s12859-024-05756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/20/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Collapse
Affiliation(s)
- Mehmet Arif Ergun
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Omer Cinal
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Berkant Bakışlı
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey.
| |
Collapse
|
8
|
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304756. [PMID: 38585974 PMCID: PMC10996739 DOI: 10.1101/2024.03.22.24304756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse Keskus
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A. Lansdon
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | | | - Samuel Sacco
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K. Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Irina Pushel
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S. Farooqi
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
9
|
Cebeci YE, Erturk RA, Ergun MA, Baysan M. Improving somatic exome sequencing performance by biological replicates. BMC Bioinformatics 2024; 25:124. [PMID: 38519906 PMCID: PMC10958848 DOI: 10.1186/s12859-024-05742-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 03/13/2024] [Indexed: 03/25/2024] Open
Abstract
BACKGROUND Next-generation sequencing (NGS) technologies offer fast and inexpensive identification of DNA sequences. Somatic sequencing is among the primary applications of NGS, where acquired (non-inherited) variants are based on comparing diseased and healthy tissues from the same individual. Somatic mutations in genetic diseases such as cancer are tightly associated with genomic instability. Genomic instability increases heterogenity, complicating sequencing efforts further, a task already challenged by the presence of short reads and repetitions in human DNA. This leads to low concordance among studies and limits reproducibility. This limitation is a significant problem since identified mutations in somatic sequencing are major biomarkers for diagnosis and the primary input of targeted therapies. Benchmarking studies were conducted to assess the error rates and increase reproducibility. Unfortunately, the number of somatic benchmarking sets is very limited due to difficulties in validating true somatic variants. Moreover, most NGS benchmarking studies are based on relatively simpler germline (inherited) sequencing. Recently, a comprehensive somatic sequencing benchmarking set was published by Sequencing Quality Control Phase 2 (SEQC2). We chose this dataset for our experiments because it is a well-validated, cancer-focused dataset that includes many tumor/normal biological replicates. Our study has two primary goals. First goal is to determine how replicate-based consensus approaches can improve the accuracy of somatic variant detection systems. Second goal is to develop highly predictive machine learning (ML) models by employing replicate-based consensus variants as labels during the training phase. RESULTS Ensemble approaches that combine alternative algorithms are relatively common; here, as an alternative, we study the performance enhancement potential of biological replicates. We first developed replicate-based consensus approaches that utilize the biological replicates available in this study to improve variant calling performance. Subsequently, we trained ML models using these biological replicates and achieved performance comparable to optimal ML models, those trained using high-confidence variants identified in advance. CONCLUSIONS Our replicate-based consensus approach can be used to improve variant calling performance and develop efficient ML models. Given the relative ease of obtaining biological replicates, this strategy allows for the development of efficient ML models tailored to specific datasets or scenarios.
Collapse
Affiliation(s)
- Yunus Emre Cebeci
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Rumeysa Aslihan Erturk
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Arif Ergun
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey.
| |
Collapse
|
10
|
Simpson JT. Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582089. [PMID: 38464143 PMCID: PMC10925087 DOI: 10.1101/2024.02.26.582089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
DNA sequencing of tumours to identify somatic mutations has become a critical tool to guide the type of treatment given to cancer patients. The gold standard for mutation calling is comparing sequencing data from the tumour to a matched normal sample to avoid mis-classifying inherited SNPs as mutations. This procedure works extremely well, but in certain situations only a tumour sample is available. While approaches have been developed to find mutations without a matched normal, they have limited accuracy or require specific types of input data (e.g. ultra-deep sequencing). Here we explore the application of single molecule long read sequencing to calling somatic mutations without matched normal samples. We develop a simple theoretical framework to show how haplotype phasing is an important source of information for determining whether a variant is a somatic mutation. We then use simulations to assess the range of experimental parameters (tumour purity, sequencing depth) where this approach is effective. These ideas are developed into a prototype somatic mutation caller, smrest, and its use is demonstrated on two highly mutated cancer cell lines. Finally, we argue that this approach has potential to measure clinically important biomarkers that are based on the genome-wide distribution of mutations: tumour mutation burden and mutation signatures.
Collapse
Affiliation(s)
- Jared T. Simpson
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| |
Collapse
|
11
|
Jia P, Dong L, Yang X, Wang B, Bush SJ, Wang T, Lin J, Wang S, Zhao X, Xu T, Che Y, Dang N, Ren L, Zhang Y, Wang X, Liang F, Wang Y, Ruan J, Xia H, Zheng Y, Shi L, Lv Y, Wang J, Ye K. Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. Genome Biol 2023; 24:277. [PMID: 38049885 PMCID: PMC10694985 DOI: 10.1186/s13059-023-03116-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/21/2023] [Indexed: 12/06/2023] Open
Abstract
BACKGROUND Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.
Collapse
Affiliation(s)
- Peng Jia
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Lianhua Dong
- National Institute of Metrology, Beijing, 100029, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tingjie Wang
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xixi Zhao
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yizhuo Che
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Ningxin Dang
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yujing Zhang
- National Institute of Metrology, Beijing, 100029, China
| | - Xia Wang
- National Institute of Metrology, Beijing, 100029, China
| | - Fan Liang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Yang Wang
- GrandOmics Biosciences, Beijing, 100089, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Han Xia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, 200438, China
| | - Yi Lv
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
| | - Jing Wang
- National Institute of Metrology, Beijing, 100029, China.
| | - Kai Ye
- National Local Joint Engineering Research Center for Precision Surgery & Regenerative Medicine, Center for Mathematical Medical, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
- Genome Institute, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
- Faculty of Science, Leiden University, Leiden, 2311EZ, The Netherlands.
| |
Collapse
|
12
|
Vozza G, Bonetti E, Tini G, Favalli V, Frigè G, Bucci G, De Summa S, Zanfardino M, Zapelloni F, Mazzarella L. Benchmarking and improving the performance of variant-calling pipelines with RecallME. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:btad722. [PMID: 38092052 DOI: 10.1093/bioinformatics/btad722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 09/03/2023] [Indexed: 12/25/2023]
Abstract
MOTIVATION The steady increment of Whole Genome/Exome sequencing and the development of novel Next Generation Sequencing-based gene panels requires continuous testing and validation of variant calling (VC) pipelines and the detection of sequencing-related issues to be maintained up-to-date and feasible for the clinical settings. State of the art tools are reliable when used to compute standard performance metrics. However, the need for an automated software to discriminate between bioinformatic and sequencing issues and to optimize VC parameters remains unmet. RESULTS The aim of the current work is to present RecallME, a bioinformatic suite that tracks down difficult-to-detect variants as insertions and deletions in highly repetitive regions, thus providing the maximum reachable recall for both single nucleotide variants and small insertion and deletions and to precisely guide the user in the pipeline optimization process. AVAILABILITY AND IMPLEMENTATION Source code is freely available under MIT license at https://github.com/mazzalab-ieo/recallme. RecallME web application is available at https://translational-oncology-lab.shinyapps.io/recallme/. To use RecallME, users must obtain a license for ANNOVAR by themselves.
Collapse
Affiliation(s)
- Gianluca Vozza
- Department of Experimental Oncology, European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hematology-Oncology, Università degli Studi di Milano, Milan, Italy
| | - Emanuele Bonetti
- Department of Experimental Oncology, European Institute of Oncology IRCCS, Milan, Italy
- Department of Oncology and Hematology-Oncology, Università degli Studi di Milano, Milan, Italy
| | - Giulia Tini
- Department of Experimental Oncology, European Institute of Oncology IRCCS, Milan, Italy
| | | | - Gianmaria Frigè
- Department of Experimental Oncology, European Institute of Oncology IRCCS, Milan, Italy
| | - Gabriele Bucci
- Center for Omics Sciences, IRCCS Ospedale San Raffaele, 20132 Milano, Italy
| | - Simona De Summa
- Molecular Diagnostics and Pharmacogenetics Unit, IRCCS Istituto Tumori, "Giovanni Paolo II", Bari, Italy
| | | | | | - Luca Mazzarella
- Department of Experimental Oncology, European Institute of Oncology IRCCS, Milan, Italy
| |
Collapse
|
13
|
Yang J, Liu Y, Shang J, Chen Q, Chen Q, Ren L, Zhang N, Yu Y, Li Z, Song Y, Yang S, Scherer A, Tong W, Hong H, Xiao W, Shi L, Zheng Y. The Quartet Data Portal: integration of community-wide resources for multiomics quality control. Genome Biol 2023; 24:245. [PMID: 37884999 PMCID: PMC10601216 DOI: 10.1186/s13059-023-03091-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The Quartet Data Portal facilitates community access to well-characterized reference materials, reference datasets, and related resources established based on a family of four individuals with identical twins from the Quartet Project. Users can request DNA, RNA, protein, and metabolite reference materials, as well as datasets generated across omics, platforms, labs, protocols, and batches. Reproducible analysis tools allow for objective performance assessment of user-submitted data, while interactive visualization tools support rapid exploration of reference datasets. A closed-loop "distribution-collection-evaluation-integration" workflow enables updates and integration of community-contributed multiomics data. Ultimately, this portal helps promote the advancement of reference datasets and multiomics quality control.
Collapse
Affiliation(s)
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yueqiang Song
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shengpeng Yang
- Intelligent Storage, Alibaba Cloud, Alibaba Group, Hangzhou, Zhejiang, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
14
|
Lysenkova Wiklander M, Övernäs E, Lagensjö J, Raine A, Petri A, Wiman AC, Ramsell J, Marincevic-Zuniga Y, Gezelius H, Martin T, Bunikis I, Ekberg S, Erlandsson R, Larsson P, Mosbech MB, Häggqvist S, Hellstedt Kerje S, Feuk L, Ameur A, Liljedahl U, Nordlund J. Genomic, transcriptomic and epigenomic sequencing data of the B-cell leukemia cell line REH. BMC Res Notes 2023; 16:265. [PMID: 37817248 PMCID: PMC10566058 DOI: 10.1186/s13104-023-06537-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
OBJECTIVES The aim of this data paper is to describe a collection of 33 genomic, transcriptomic and epigenomic sequencing datasets of the B-cell acute lymphoblastic leukemia (ALL) cell line REH. REH is one of the most frequently used cell lines for functional studies of pediatric ALL, and these data provide a multi-faceted characterization of its molecular features. The datasets described herein, generated with short- and long-read sequencing technologies, can both provide insights into the complex aberrant karyotype of REH, and be used as reference datasets for sequencing data quality assessment or for methods development. DATA DESCRIPTION This paper describes 33 datasets corresponding to 867 gigabases of raw sequencing data generated from the REH cell line. These datasets include five different approaches for whole genome sequencing (WGS) on four sequencing platforms, two RNA sequencing (RNA-seq) techniques on two different sequencing platforms, DNA methylation sequencing, and single-cell ATAC-sequencing.
Collapse
Affiliation(s)
- Mariya Lysenkova Wiklander
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Elin Övernäs
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Johanna Lagensjö
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Amanda Raine
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Anna Petri
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ann-Christin Wiman
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Jon Ramsell
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Yanara Marincevic-Zuniga
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Henrik Gezelius
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Tom Martin
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Ignas Bunikis
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Sara Ekberg
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Rikard Erlandsson
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Pontus Larsson
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Mai-Britt Mosbech
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Susana Häggqvist
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Susanne Hellstedt Kerje
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Lars Feuk
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Adam Ameur
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ulrika Liljedahl
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden
| | - Jessica Nordlund
- Department of Medical Sciences and Science for Life Laboratory, Uppsala University, Box 1432, Uppsala, SE-751 44, Sweden.
| |
Collapse
|
15
|
Ichikawa K, Kawahara R, Asano T, Morishita S. A landscape of complex tandem repeats within individual human genomes. Nat Commun 2023; 14:5530. [PMID: 37709751 PMCID: PMC10502081 DOI: 10.1038/s41467-023-41262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 08/28/2023] [Indexed: 09/16/2023] Open
Abstract
Markedly expanded tandem repeats (TRs) have been correlated with ~60 diseases. TR diversity has been considered a clue toward understanding missing heritability. However, haplotype-resolved long TRs remain mostly hidden or blacked out because their complex structures (TRs composed of various units and minisatellites containing >10-bp units) make them difficult to determine accurately with existing methods. Here, using a high-precision algorithm to determine complex TR structures from long, accurate reads of PacBio HiFi, an investigation of 270 Japanese control samples yields several genome-wide findings. Approximately 322,000 TRs are difficult to impute from the surrounding single-nucleotide variants. Greater genetic divergence of TR loci is significantly correlated with more events of younger replication slippage. Complex TRs are more abundant than single-unit TRs, and a tendency for complex TRs to consist of <10-bp units and single-unit TRs to be minisatellites is statistically significant at loci with ≥500-bp TRs. Of note, 8909 loci with extended TRs (>100b longer than the mode) contain several known disease-associated TRs and are considered candidates for association with disorders. Overall, complex TRs and minisatellites are found to be abundant and diverse, even in genetically small Japanese populations, yielding insights into the landscape of long TRs.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Riki Kawahara
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Takeshi Asano
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 277-8561, Chiba, Japan.
| |
Collapse
|
16
|
Nguyen BQT, Tran TPD, Nguyen HT, Nguyen TN, Pham TMQ, Nguyen HTP, Tran DH, Nguyen V, Tran TS, Pham TVN, Le MT, Phan MD, Giang H, Nguyen HN, Tran LS. Improvement in neoantigen prediction via integration of RNA sequencing data for variant calling. Front Immunol 2023; 14:1251603. [PMID: 37731488 PMCID: PMC10507271 DOI: 10.3389/fimmu.2023.1251603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 08/17/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction Neoantigen-based immunotherapy has emerged as a promising strategy for improving the life expectancy of cancer patients. This therapeutic approach heavily relies on accurate identification of cancer mutations using DNA sequencing (DNAseq) data. However, current workflows tend to provide a large number of neoantigen candidates, of which only a limited number elicit efficient and immunogenic T-cell responses suitable for downstream clinical evaluation. To overcome this limitation and increase the number of high-quality immunogenic neoantigens, we propose integrating RNA sequencing (RNAseq) data into the mutation identification step in the neoantigen prediction workflow. Methods In this study, we characterize the mutation profiles identified from DNAseq and/or RNAseq data in tumor tissues of 25 patients with colorectal cancer (CRC). Immunogenicity was then validated by ELISpot assay using long synthesis peptides (sLP). Results We detected only 22.4% of variants shared between the two methods. In contrast, RNAseq-derived variants displayed unique features of affinity and immunogenicity. We further established that neoantigen candidates identified by RNAseq data significantly increased the number of highly immunogenic neoantigens (confirmed by ELISpot) that would otherwise be overlooked if relying solely on DNAseq data. Discussion This integrative approach holds great potential for improving the selection of neoantigens for personalized cancer immunotherapy, ultimately leading to enhanced treatment outcomes and improved survival rates for cancer patients.
Collapse
Affiliation(s)
| | | | - Huu Thinh Nguyen
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | | | | | | | - Duc Huy Tran
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | - Vy Nguyen
- Medical Genetics Institute, Ho Chi Minh, Vietnam
| | - Thanh Sang Tran
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | | | - Minh-Triet Le
- University Medical Center Ho Chi Minh City, Ho Chi Minh, Vietnam
| | | | - Hoa Giang
- Medical Genetics Institute, Ho Chi Minh, Vietnam
| | | | - Le Son Tran
- Medical Genetics Institute, Ho Chi Minh, Vietnam
| |
Collapse
|
17
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
18
|
Talsania K, Shen TW, Chen X, Jaeger E, Li Z, Chen Z, Chen W, Tran B, Kusko R, Wang L, Pang AWC, Yang Z, Choudhari S, Colgan M, Fang LT, Carroll A, Shetty J, Kriga Y, German O, Smirnova T, Liu T, Li J, Kellman B, Hong K, Hastie AR, Natarajan A, Moshrefi A, Granat A, Truong T, Bombardi R, Mankinen V, Meerzaman D, Mason CE, Collins J, Stahlberg E, Xiao C, Wang C, Xiao W, Zhao Y. Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. Genome Biol 2022; 23:255. [PMID: 36514120 PMCID: PMC9746098 DOI: 10.1186/s13059-022-02816-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 11/17/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. RESULTS We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. CONCLUSIONS A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.
Collapse
Affiliation(s)
- Keyur Talsania
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tsai-Wei Shen
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Xiongfong Chen
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Zhipan Li
- Sentieon Inc, Mountain View, CA, USA
| | - Zhong Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Limin Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | | | - Zhaowei Yang
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Sulbha Choudhari
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Michael Colgan
- Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Li Tai Fang
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc, 1301 Shoreway Road, Belmont, CA, 94002, USA
| | | | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yuliya Kriga
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Oksana German
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tatyana Smirnova
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tiantain Liu
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Jing Li
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | | | - Karl Hong
- Bionano Genomics, San Diego, CA92121, USA
| | | | | | | | | | | | | | | | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Jack Collins
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Eric Stahlberg
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
| | - Wenming Xiao
- Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA.
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
| |
Collapse
|
19
|
Krishnamachari K, Lu D, Swift-Scott A, Yeraliyev A, Lee K, Huang W, Leng SN, Skanderup AJ. Accurate somatic variant detection using weakly supervised deep learning. Nat Commun 2022; 13:4248. [PMID: 35869060 PMCID: PMC9307817 DOI: 10.1038/s41467-022-31765-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 06/29/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractIdentification of somatic mutations in tumor samples is commonly based on statistical methods in combination with heuristic filters. Here we develop VarNet, an end-to-end deep learning approach for identification of somatic variants from aligned tumor and matched normal DNA reads. VarNet is trained using image representations of 4.6 million high-confidence somatic variants annotated in 356 tumor whole genomes. We benchmark VarNet across a range of publicly available datasets, demonstrating performance often exceeding current state-of-the-art methods. Overall, our results demonstrate how a scalable deep learning approach could augment and potentially supplant human engineered features and heuristic filters in somatic variant calling.
Collapse
|
20
|
Xiao C, Chen Z, Chen W, Padilla C, Colgan M, Wu W, Fang LT, Liu T, Yang Y, Schneider V, Wang C, Xiao W. Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples. Genome Biol 2022; 23:237. [PMID: 36352452 PMCID: PMC9648002 DOI: 10.1186/s13059-022-02803-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The use of a personalized haplotype-specific genome assembly, rather than an unrelated, mosaic genome like GRCh38, as a reference for detecting the full spectrum of somatic events from cancers has long been advocated but has never been explored in tumor-normal paired samples. Here, we provide the first demonstrated use of de novo assembled personalized genome as a reference for cancer mutation detection and quantifying the effects of the reference genomes on the accuracy of somatic mutation detection. RESULTS We generate de novo assemblies of the first tumor-normal paired genomes, both nuclear and mitochondrial, derived from the same individual with triple negative breast cancer. The personalized genome was chromosomal scale, haplotype phased, and annotated. We demonstrate that it provides individual specific haplotypes for complex regions and medically relevant genes. We illustrate that the personalized genome reference not only improves read alignments for both short-read and long-read sequencing data but also ameliorates the detection accuracy of somatic SNVs and SVs. We identify the equivalent somatic mutation calls between two genome references and uncover novel somatic mutations only when personalized genome assembly is used as a reference. CONCLUSIONS Our findings demonstrate that use of a personalized genome with individual-specific haplotypes is essential for accurate detection of the full spectrum of somatic mutations in the paired tumor-normal samples. The unique resource and methodology established in this study will be beneficial to the development of precision oncology medicine not only for breast cancer, but also for other cancers.
Collapse
Affiliation(s)
- Chunlin Xiao
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Zhong Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wanqiu Chen
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Cory Padilla
- grid.504403.6Dovetail Genomics, 100 Enterprise Way, Scotts Valley, CA 95066 USA
| | - Michael Colgan
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| | - Wenjun Wu
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Li-Tai Fang
- grid.418158.10000 0004 0534 4718Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Road, Belmont, CA 94002 USA
| | - Tiantian Liu
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Yibin Yang
- grid.249335.a0000 0001 2218 7820Blood Cell Development and Function Program, Fox Chase Cancer Center, Philadelphia, PA 19111 USA
| | - Valerie Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20894 USA
| | - Charles Wang
- grid.43582.380000 0000 9852 649XCenter for Genomics, Loma Linda University School of Medicine, 11021 Campus St., Loma Linda, CA 92350 USA
| | - Wenming Xiao
- grid.417587.80000 0001 2243 3366The Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD USA
| |
Collapse
|
21
|
Del Corvo M, Mazzara S, Pileri SA. TOSCA: an automated Tumor Only Somatic CAlling workflow for somatic mutation detection without matched normal samples. BIOINFORMATICS ADVANCES 2022; 2:vbac070. [PMID: 36699358 PMCID: PMC9710689 DOI: 10.1093/bioadv/vbac070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 09/16/2022] [Accepted: 09/22/2022] [Indexed: 01/28/2023]
Abstract
Motivation Accurate classification of somatic variants in a tumor sample is often accomplished by utilizing a paired normal tissue sample from the same patient to enable the separation of private germline mutations from somatic variants. However, a paired normal sample is not always available, making a reliable somatic variant calling more challenging. In silico screening of variants against public or private databases and other filtering approaches are often used in absence of a paired normal sample. Nevertheless, difficulties in performing a tumor-only calling with sufficient accuracy and lack of open-source software have limited their applications in clinical research. Results To address these limitations, we developed TOSCA, the first automated tumor-only somatic calling workflow in whole-exome sequencing and targeted panel sequencing data which performs an end-to-end analysis from raw read files, via quality checks, alignment and variant calling to functional annotation, databases filtering, tumor purity and ploidy estimation and variant classification. Application of our workflow to tumor-only data provides estimates of somatic and germline variants that are consistent with results from paired analyses. Availability and implementation TOSCA is a Snakemake-based workflow and freely available at https://github.com/mdelcorvo/TOSCA. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Saveria Mazzara
- Division of Haematopathology, IEO, European Institute of Oncology IRCCS, Milan 20141, Italy
| | - Stefano A Pileri
- Division of Haematopathology, IEO, European Institute of Oncology IRCCS, Milan 20141, Italy
| |
Collapse
|
22
|
Zhang Y, Blomquist TM, Kusko R, Stetson D, Zhang Z, Yin L, Sebra R, Gong B, Lococo JS, Mittal VK, Novoradovskaya N, Yeo JY, Dominiak N, Hipp J, Raymond A, Qiu F, Arib H, Smith ML, Brock JE, Farkas DH, Craig DJ, Crawford EL, Li D, Morrison T, Tom N, Xiao W, Yang M, Mason CE, Richmond TA, Jones W, Johann DJ, Shi L, Tong W, Willey JC, Xu J. Deep oncopanel sequencing reveals within block position-dependent quality degradation in FFPE processed samples. Genome Biol 2022; 23:141. [PMID: 35768876 PMCID: PMC9241261 DOI: 10.1186/s13059-022-02709-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/15/2022] [Indexed: 11/13/2022] Open
Abstract
Background Clinical laboratories routinely use formalin-fixed paraffin-embedded (FFPE) tissue or cell block cytology samples in oncology panel sequencing to identify mutations that can predict patient response to targeted therapy. To understand the technical error due to FFPE processing, a robustly characterized diploid cell line was used to create FFPE samples with four different pre-tissue processing formalin fixation times. A total of 96 FFPE sections were then distributed to different laboratories for targeted sequencing analysis by four oncopanels, and variants resulting from technical error were identified. Results Tissue sections that fail more frequently show low cellularity, lower than recommended library preparation DNA input, or target sequencing depth. Importantly, sections from block surfaces are more likely to show FFPE-specific errors, akin to “edge effects” seen in histology, while the inner samples display no quality degradation related to fixation time. Conclusions To assure reliable results, we recommend avoiding the block surface portion and restricting mutation detection to genomic regions of high confidence. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02709-8.
Collapse
Affiliation(s)
- Yifan Zhang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Thomas M Blomquist
- (Formerly) Department of Pathology, College of Medicine and Life Sciences, The University of Toledo, Toledo, OH, 43614, USA.,Lucas County Coroner's Office, 2595 Arlington Ave, Toledo, OH, 43614, USA
| | - Rebecca Kusko
- Immuneering Corporation, 245 Main St, Cambridge, MA, 02142, USA
| | - Daniel Stetson
- Astrazeneca Pharmaceuticals, 35 Gatehouse Dr, Waltham, MA, 02451, USA
| | - Zhihong Zhang
- Research and Development, Burning Rock Biotech, Shanghai, 201114, China
| | - Lihui Yin
- (Formerly) Pathology and Laboratory Medicine Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH, 44195, USA
| | - Robert Sebra
- Icahn Institute and Department of Genetics and Genomic Sciences Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, NY, 10029, USA
| | - Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | - Vinay K Mittal
- Thermo Fisher Scientific, 110 Miller Ave, Ann Arbor, MI, 48104, USA
| | | | - Ji-Youn Yeo
- Department of Pathology, University of Toledo, 3000 Arlington Ave, Toledo, OH, 43614, USA
| | - Nicole Dominiak
- Department of Pathology, University of Toledo, 3000 Arlington Ave, Toledo, OH, 43614, USA
| | - Jennifer Hipp
- Department of Pathology, Strata Oncology, Inc., Ann Arbor, MI, 48103, USA
| | - Amelia Raymond
- Astrazeneca Pharmaceuticals, 35 Gatehouse Dr, Waltham, MA, 02451, USA
| | - Fujun Qiu
- Research and Development, Burning Rock Biotech, Shanghai, 201114, China
| | - Hanane Arib
- Icahn Institute and Department of Genetics and Genomic Sciences Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, NY, 10029, USA
| | - Melissa L Smith
- Icahn Institute and Department of Genetics and Genomic Sciences Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, NY, 10029, USA
| | - Jay E Brock
- Pathology and Laboratory Medicine Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH, 44195, USA
| | - Daniel H Farkas
- Pathology and Laboratory Medicine Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, OH, 44195, USA
| | - Daniel J Craig
- Department of Medicine, College of Medicine and Life Sciences, The University of Toledo, Toledo, OH, 43614, USA
| | - Erin L Crawford
- Department of Medicine, College of Medicine and Life Sciences, The University of Toledo, Toledo, OH, 43614, USA
| | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Tom Morrison
- Accugenomics, Inc., 1410 Commonwealth Drive, Suite 105, Wilmington, NC, 20403, USA
| | - Nikola Tom
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic.,EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081 HZ, Amsterdam, The Netherlands
| | - Wenzhong Xiao
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA.,Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
| | - Mary Yang
- Department of Information Science, University of Arkansas at Little Rock, 2801 S. Univ. Ave, Little Rock, AR, 72204, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | - Todd A Richmond
- Market & Application Development Bioinformatics, Roche Sequencing Solutions Inc., 4300 Hacienda Dr, Pleasanton, CA, 94588, USA
| | - Wendell Jones
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd, Morrisville, NC, 27560, USA
| | - Donald J Johann
- Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR, 72205, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China.,Human Phenome Institute, Fudan University, Shanghai, 201203, China.,Fudan-Gospel Joint Research Center for Precision Medicine, Fudan University, Shanghai, 200438, China
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - James C Willey
- Departments of Medicine, Pathology, and Cancer Biology, College of Medicine and Life Sciences, University of Toledo Health Sciences Campus, 3000 Arlington Ave, Toledo, OH, 43614, USA.
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
23
|
Peng R, Lin G, Li L, Li J. Development of a Novel Reference Material for Tumor Mutational Burden Measurement Based on CRISPR/Cas9 Technology. Front Oncol 2022; 12:845636. [PMID: 35574377 PMCID: PMC9098197 DOI: 10.3389/fonc.2022.845636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 04/08/2022] [Indexed: 12/03/2022] Open
Abstract
As a biomarker that affects treatment decisions of immune checkpoint inhibitors, the accuracy, reliability, and comparability of tumor mutational burden (TMB) estimation is of paramount importance. To improve the consistency and reliability of these tests, qualified reference materials providing ground-truth data are crucial. In this study, we developed a set of formalin-fixed and paraffin-embedded (FFPE) samples with different TMB values as the novel reference materials for TMB estimation. By introducing several clinically relevant variants in MutS Homolog 2 (MSH2) gene and DNA polymerase epsilon (POLE) gene into human cell lines using CRISPR/Cas9 technology, we first constructed four typical cell lines which verified with hypermutator or ultramutator phenotype. Followed by cell mixing and paraffin embedding, the novel FFPE samples were prepared. It was confirmed that our novel FFPE samples have sufficient quantity of cells, high reproducibility, and they can provide matched wild type sample as the genetic background. The double-platform whole exome sequencing validation showed that our FFPE samples were also highly flexible as they containing different TMB values spanning a clinically relevant range (2.0–106.1 mut/Mb). Without limitations on production and TMB values, our novel FFPE samples based on CRISPR/Cas9 editing are suitable as candidate reference materials. From a practical point of view, these samples can be used for the validation, verification, internal quality control, and proficiency testing of TMB assessment.
Collapse
Affiliation(s)
- Rongxue Peng
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
| | - Guigao Lin
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
| | - Lin Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
| |
Collapse
|
24
|
Wang D, Zhang Y, li R, Li J, Zhang R. Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency. J Adv Res 2022; 44:161-172. [PMID: 36725187 PMCID: PMC9937796 DOI: 10.1016/j.jare.2022.03.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 03/16/2022] [Accepted: 03/27/2022] [Indexed: 02/04/2023] Open
Abstract
INTRODUCTION Clinical precision oncology increasingly relies on accurate genome-wide profiling using large panel next generation sequencing; however, difficulties in accurate and consistent detection of somatic mutation from individual platforms and pipelines remain an open question. OBJECTIVES To obtain paired tumor-normal reference materials that can be effectively constructed and interchangeable with clinical samples, and evaluate the performance of 56 panels under routine testing conditions based on the reference samples. METHODS Genes involved in mismatch repair and DNA proofreading were knocked down using the CRISPR-Cas9 technology to accumulate somatic mutations in a defined GM12878 cell line. They were used as reference materials to comprehensively evaluate the reproducibility and accuracy of detection results of oncopanels and explore the potential influencing factors. RESULTS In total, 14 paired tumor-normal reference DNA samples from engineered cell lines were prepared, and a reference dataset comprising 168 somatic mutations in a high-confidence region of 1.8 Mb were generated. For mutations with an allele frequency (AF) of more than 5% in reference samples, 56 panels collectively reported 1306 errors, including 729 false negatives (FNs), 179 false positives (FPs) and 398 reproducibility errors. The performance metric varied among panels with precision and recall ranging from 0.773 to 1 and 0.683 to 1, respectively. Incorrect and inadequate filtering accounted for a large proportion of false discovery (including FNs and FPs), while low-quality detection, cross-contamination and other sequencing errors during the wet bench process were other sources of FNs and FPs. In addition, low AF (<5%) considerably influenced the reproducibility and comparability among panels. CONCLUSIONS This study provided an integrated practice for developing reference standard to assess oncopanels in detecting somatic mutations and quantitatively revealed the source of detection errors. It will promote optimization, validation, and quality control among laboratories with potential applicability in clinical use.
Collapse
Affiliation(s)
- Duo Wang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Yuanfeng Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Rui li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| |
Collapse
|
25
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK
- University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia
- Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
26
|
Sahraeian SME, Fang LT, Karagiannis K, Moos M, Smith S, Santana-Quintero L, Xiao C, Colgan M, Hong H, Mohiyuddin M, Xiao W. Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. Genome Biol 2022; 23:12. [PMID: 34996510 PMCID: PMC8740374 DOI: 10.1186/s13059-021-02592-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. RESULTS In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. CONCLUSIONS The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.
Collapse
Affiliation(s)
| | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA, 95050, USA
| | - Konstantinos Karagiannis
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Malcolm Moos
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Sean Smith
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Luis Santana-Quintero
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Michael Colgan
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Huixiao Hong
- Bioinformatics branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR, 72079, USA
| | | | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA.
| |
Collapse
|
27
|
Zhao X, Hu AC, Wang S, Wang X. Calling small variants using universality with Bayes-factor-adjusted odds ratios. Brief Bioinform 2021; 23:6427501. [PMID: 34791010 DOI: 10.1093/bib/bbab458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/26/2021] [Accepted: 10/07/2021] [Indexed: 11/12/2022] Open
Abstract
The application of next-generation sequencing in research and particularly in clinical routine requires highly accurate variant calling. Here we describe UVC, a method for calling small variants of germline or somatic origin. By unifying opposite assumptions with sublation, we discovered the following two empirical laws to improve variant calling: allele fraction at high sequencing depth is inversely proportional to the cubic root of variant-calling error rate, and odds ratios adjusted with Bayes factors can model various sequencing biases. UVC outperformed other variant callers on the GIAB germline truth sets, 192 scenarios of in silico mixtures simulating 192 combinations of tumor/normal sequencing depths and tumor/normal purities, the GIAB somatic truth sets derived from physical mixture, and the SEQC2 somatic reference sets derived from the breast-cancer cell-line HCC1395. UVC achieved 100% concordance with the manual review conducted by multiple independent researchers on a Qiagen 71-gene-panel dataset derived from 16 patients with colon adenoma. UVC outperformed other unique molecular identifier (UMI)-aware variant callers on the datasets used for publishing these variant callers. Performance was measured with sensitivity-specificity trade off for called variants. The improved variant calls generated by UVC from previously published UMI-based sequencing data provided additional insight about DNA damage repair. UVC is open-sourced under the BSD 3-Clause license at https://github.com/genetronhealth/uvc and quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694.
Collapse
Affiliation(s)
- Xiaofei Zhao
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Allison C Hu
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Sizhen Wang
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Xiaoyue Wang
- State Key Laboratory of Medical Molecular Biology, Center for Bioinformatics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China
| |
Collapse
|
28
|
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Sci Data 2021; 8:296. [PMID: 34753956 PMCID: PMC8578599 DOI: 10.1038/s41597-021-01077-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 10/11/2021] [Indexed: 11/08/2022] Open
Abstract
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
Collapse
|
29
|
Mercer TR, Xu J, Mason CE, Tong W. The Sequencing Quality Control 2 study: establishing community standards for sequencing in precision medicine. Genome Biol 2021; 22:306. [PMID: 34749795 PMCID: PMC8574019 DOI: 10.1186/s13059-021-02528-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Tim R Mercer
- Australian Institute of Bioengineering and Nanotechnology, University of Queensland, Brisbane, Australia
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA.
| |
Collapse
|