1
|
Gong B, Li D, Zhang Y, Kusko R, Lababidi S, Cao Z, Chen M, Chen N, Chen Q, Chen Q, Dai J, Gan Q, Gao Y, Guo M, Hariani G, He Y, Hou W, Jiang H, Kushwaha G, Li JL, Li J, Li Y, Liu LC, Liu R, Liu S, Meriaux E, Mo M, Moore M, Moss TJ, Niu Q, Patel A, Ren L, Saremi NF, Shang E, Shang J, Song P, Sun S, Urban BJ, Wang D, Wang S, Wen Z, Xiong X, Yang J, Yin L, Zhang C, Zhang R, Bhandari A, Cai W, Eterovic AK, Megherbi DB, Shi T, Suo C, Yu Y, Zheng Y, Novoradovskaya N, Sears RL, Shi L, Jones W, Tong W, Xu J. Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project. Sci Rep 2024; 14:7028. [PMID: 38528062 DOI: 10.1038/s41598-024-57439-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 03/18/2024] [Indexed: 03/27/2024] Open
Abstract
Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.
Collapse
Affiliation(s)
- Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yifan Zhang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Rebecca Kusko
- Cellino Bio, 750 Main Street, Cambridge, MA, 02143, USA
| | - Samir Lababidi
- Office of Data Analytics and Research, Office of Digital Transformation, Office of the Commissioner, U.S. Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Mingyang Chen
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
| | - Ning Chen
- iGeneTech Bioscience Co., Ltd., 8 Shengmingyuan Rd., Changping, Beijing, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Jiacheng Dai
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
| | - Qiang Gan
- Clinical Diagnostics Division, Thermo Fisher Scientific, 46500 Kato Rd., Fremont, CA, 94538, USA
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Mingkun Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Gunjan Hariani
- Q squared Solutions Genomics, 2400 Ellis Road, Durham, NC, 27703, USA
| | - Yujie He
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Garima Kushwaha
- Guardant Health, Inc., 505 Penobscot Drive, Redwood City, CA, 94063, USA
| | - Jian-Liang Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Jianying Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Yulan Li
- College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Liang-Chun Liu
- Clinical Diagnostics Division, Thermo Fisher Scientific, 46500 Kato Rd., Fremont, CA, 94538, USA
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Shiming Liu
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Edwin Meriaux
- CMINDS Research Center, University of Massachusetts, Lowell, MA, 01854, USA
| | - Mengqing Mo
- Department of Epidemiology, School of Public Health, Fudan University, Shanghai, 200032, China
| | | | - Tyler J Moss
- Eurofins Viracor, LLC, 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Quanne Niu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Ananddeep Patel
- Eurofins Viracor Biopharma Services, Inc., 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Nedda F Saremi
- Agilent Technologies, Inc., 11011 N Torrey Pines Rd., La Jolla, CA, 92037, USA
| | - Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Ping Song
- Cancer Genomics Laboratory, Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Siqi Sun
- ResearchDx, Irvine, CA, 92618, USA
| | - Brent J Urban
- Eurofins Viracor Biopharma Services, Inc., 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Danke Wang
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
| | - Shangzi Wang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Xiangyi Xiong
- College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Lihui Yin
- PathGroup, Nashville, TN, 37217, USA
| | - Chao Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Wanshi Cai
- iGeneTech Bioscience Co., Ltd., 8 Shengmingyuan Rd., Changping, Beijing, China
| | - Agda Karina Eterovic
- Eurofins Viracor Biopharma Services, Inc., 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Dalila B Megherbi
- CMINDS Research Center, University of Massachusetts, Lowell, MA, 01854, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Chen Suo
- Department of Epidemiology, School of Public Health, Fudan University, Shanghai, 200032, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Renee L Sears
- Velsera, 6 Cityplace Dr Suite 550, Creve Coeur, MO, 63141, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Wendell Jones
- Q squared Solutions Genomics, 2400 Ellis Road, Durham, NC, 27703, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
2
|
Cannon S, Williams M, Gunning AC, Wright CF. Evaluation of in silico pathogenicity prediction tools for the classification of small in-frame indels. BMC Med Genomics 2023; 16:36. [PMID: 36855133 PMCID: PMC9972633 DOI: 10.1186/s12920-023-01454-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 02/09/2023] [Indexed: 03/02/2023] Open
Abstract
BACKGROUND The use of in silico pathogenicity predictions as evidence when interpreting genetic variants is widely accepted as part of standard variant classification guidelines. Although numerous algorithms have been developed and evaluated for classifying missense variants, in-frame insertions/deletions (indels) have been much less well studied. METHODS We created a dataset of 3964 small (< 100 bp) indels predicted to result in in-frame amino acid insertions or deletions using data from gnomAD v3.1 (minor allele frequency of 1-5%), ClinVar and the Deciphering Developmental Disorders (DDD) study. We used this dataset to evaluate the performance of nine pathogenicity predictor tools: CADD, CAPICE, FATHMM-indel, MutPred-Indel, MutationTaster2021, PROVEAN, SIFT-indel, VEST-indel and VVP. RESULTS Our dataset consisted of 2224 benign/likely benign and 1740 pathogenic/likely pathogenic variants from gnomAD (n = 809), ClinVar (n = 2882) and, DDD (n = 273). We were able to generate scores across all tools for 91% of the variants, with areas under the ROC curve (AUC) of 0.81-0.96 based on the published recommended thresholds. To avoid biases caused by inclusion of our dataset in the tools' training data, we also evaluated just DDD variants not present in either gnomAD or ClinVar (70 pathogenic and 81 benign). Using this subset, the AUC of all tools decreased substantially to 0.64-0.87. Several of the tools performed similarly however, VEST-indel had the highest AUCs of 0.93 (full dataset) and 0.87 (DDD subset). CONCLUSIONS Algorithms designed for predicting the pathogenicity of in-frame indels perform well enough to aid clinical variant classification in a similar manner to missense prediction tools.
Collapse
Affiliation(s)
- S Cannon
- Department of Clinical and Biomedical Sciences (Medical School), Faculty of Health and Life Sciences, University of Exeter, Research, Innovation, Learning and Development Building, Royal Devon and Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - M Williams
- Department of Clinical and Biomedical Sciences (Medical School), Faculty of Health and Life Sciences, University of Exeter, Research, Innovation, Learning and Development Building, Royal Devon and Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - A C Gunning
- Department of Clinical and Biomedical Sciences (Medical School), Faculty of Health and Life Sciences, University of Exeter, Research, Innovation, Learning and Development Building, Royal Devon and Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK
| | - C F Wright
- Department of Clinical and Biomedical Sciences (Medical School), Faculty of Health and Life Sciences, University of Exeter, Research, Innovation, Learning and Development Building, Royal Devon and Exeter Hospital, Barrack Road, Exeter, EX2 5DW, UK.
| |
Collapse
|