1
|
Gong B, Li D, Zhang Y, Kusko R, Lababidi S, Cao Z, Chen M, Chen N, Chen Q, Chen Q, Dai J, Gan Q, Gao Y, Guo M, Hariani G, He Y, Hou W, Jiang H, Kushwaha G, Li JL, Li J, Li Y, Liu LC, Liu R, Liu S, Meriaux E, Mo M, Moore M, Moss TJ, Niu Q, Patel A, Ren L, Saremi NF, Shang E, Shang J, Song P, Sun S, Urban BJ, Wang D, Wang S, Wen Z, Xiong X, Yang J, Yin L, Zhang C, Zhang R, Bhandari A, Cai W, Eterovic AK, Megherbi DB, Shi T, Suo C, Yu Y, Zheng Y, Novoradovskaya N, Sears RL, Shi L, Jones W, Tong W, Xu J. Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project. Sci Rep 2024; 14:7028. [PMID: 38528062 DOI: 10.1038/s41598-024-57439-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 03/18/2024] [Indexed: 03/27/2024] Open
Abstract
Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.
Collapse
Affiliation(s)
- Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yifan Zhang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Rebecca Kusko
- Cellino Bio, 750 Main Street, Cambridge, MA, 02143, USA
| | - Samir Lababidi
- Office of Data Analytics and Research, Office of Digital Transformation, Office of the Commissioner, U.S. Food and Drug Administration, Silver Spring, MD, 20993, USA
| | - Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Mingyang Chen
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
| | - Ning Chen
- iGeneTech Bioscience Co., Ltd., 8 Shengmingyuan Rd., Changping, Beijing, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Jiacheng Dai
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
| | - Qiang Gan
- Clinical Diagnostics Division, Thermo Fisher Scientific, 46500 Kato Rd., Fremont, CA, 94538, USA
| | - Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Mingkun Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Gunjan Hariani
- Q squared Solutions Genomics, 2400 Ellis Road, Durham, NC, 27703, USA
| | - Yujie He
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Garima Kushwaha
- Guardant Health, Inc., 505 Penobscot Drive, Redwood City, CA, 94063, USA
| | - Jian-Liang Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Jianying Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, 27709, USA
| | - Yulan Li
- College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Liang-Chun Liu
- Clinical Diagnostics Division, Thermo Fisher Scientific, 46500 Kato Rd., Fremont, CA, 94538, USA
| | - Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Shiming Liu
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Edwin Meriaux
- CMINDS Research Center, University of Massachusetts, Lowell, MA, 01854, USA
| | - Mengqing Mo
- Department of Epidemiology, School of Public Health, Fudan University, Shanghai, 200032, China
| | | | - Tyler J Moss
- Eurofins Viracor, LLC, 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Quanne Niu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Ananddeep Patel
- Eurofins Viracor Biopharma Services, Inc., 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Nedda F Saremi
- Agilent Technologies, Inc., 11011 N Torrey Pines Rd., La Jolla, CA, 92037, USA
| | - Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Ping Song
- Cancer Genomics Laboratory, Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Siqi Sun
- ResearchDx, Irvine, CA, 92618, USA
| | - Brent J Urban
- Eurofins Viracor Biopharma Services, Inc., 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Danke Wang
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
| | - Shangzi Wang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Xiangyi Xiong
- College of Life Sciences, Shanghai Normal University, Shanghai, 200234, China
| | - Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Lihui Yin
- PathGroup, Nashville, TN, 37217, USA
| | - Chao Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Wanshi Cai
- iGeneTech Bioscience Co., Ltd., 8 Shengmingyuan Rd., Changping, Beijing, China
| | - Agda Karina Eterovic
- Eurofins Viracor Biopharma Services, Inc., 18000 W 99th St., Lenexa, KS, 66219, USA
| | - Dalila B Megherbi
- CMINDS Research Center, University of Massachusetts, Lowell, MA, 01854, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Chen Suo
- Department of Epidemiology, School of Public Health, Fudan University, Shanghai, 200032, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | | | - Renee L Sears
- Velsera, 6 Cityplace Dr Suite 550, Creve Coeur, MO, 63141, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438, China
| | - Wendell Jones
- Q squared Solutions Genomics, 2400 Ellis Road, Durham, NC, 27703, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
8
|
Miller EM, Patterson NE, Gressel GM, Karabakhtsian RG, Bejerano-Sagie M, Ravi N, Maslov A, Quispe-Tintaya W, Wang T, Lin J, Smith HO, Goldberg GL, Kuo DYS, Montagna C. Utility of a custom designed next generation DNA sequencing gene panel to molecularly classify endometrial cancers according to The Cancer Genome Atlas subgroups. BMC Med Genomics 2020; 13:179. [PMID: 33256706 PMCID: PMC7706212 DOI: 10.1186/s12920-020-00824-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 11/12/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The Cancer Genome Atlas identified four molecular subgroups of endometrial cancer with survival differences based on whole genome, transcriptomic, and proteomic characterization. Clinically accessible algorithms that reproduce this data are needed. Our aim was to determine if targeted sequencing alone allowed for molecular classification of endometrial cancer. METHODS Using a custom-designed 156 gene panel, we analyzed 47 endometrial cancers and matching non-tumor tissue. Variants were annotated for pathogenicity and medical records were reviewed for the clinicopathologic variables. Using molecular characteristics, tumors were classified into four subgroups. Group 1 included patients with > 570 unfiltered somatic variants, > 9 cytosine to adenine nucleotide substitutions per sample, and < 1 cytosine to guanine nucleotide substitution per sample. Group 2 included patients with any somatic mutation in MSH2, MSH6, MLH1, PMS2. Group 3 included patients with TP53 mutations without mutation in mismatch repair genes. Remaining patients were classified as group 4. Analyses were performed using SAS 9.4 (SAS Institute Inc., Cary, North Carolina, USA). RESULTS Endometrioid endometrial cancers had more candidate variants of potential pathogenic interest (median 6 IQR 4.13 vs. 2 IQR 2.3; p < 0.01) than uterine serous cancers. PTEN (82% vs. 15%, p < 0.01) and PIK3CA (74% vs. 23%, p < 0.01) mutations were more frequent in endometrioid than serous carcinomas. TP53 (18% vs. 77%, p < 0.01) mutations were more frequent in serous carcinomas. Visual inspection of the number of unfiltered somatic variants per sample identified six grade 3 endometrioid samples with high tumor mutational burden, all of which demonstrated POLE mutations, most commonly P286R and V411L. Of the grade 3 endometrioid carcinomas, those with POLE mutations were less likely to have risk factors necessitating adjuvant treatment than those with low tumor mutational burden. Targeted sequencing was unable to assign samples to microsatellite unstable, copy number low, and copy number high subgroups. CONCLUSIONS Targeted sequencing can predict the presence of POLE mutations based on the tumor mutational burden. However, targeted sequencing alone is inadequate to classify endometrial cancers into molecular subgroups identified by The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Eirwen M Miller
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology and Women's Health, Montefiore Medical Center, Bronx, NY, 10461, USA
| | - Nicole E Patterson
- Department of Genetics, Albert Einstein College of Medicine, Price Center/Block Research Pavilion, Room 401, 1301 Morris Park Avenue, Bronx, NY, 10461, USA
| | - Gregory M Gressel
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology and Women's Health, Montefiore Medical Center, Bronx, NY, 10461, USA
| | | | - Michal Bejerano-Sagie
- Department of Genetics, Albert Einstein College of Medicine, Price Center/Block Research Pavilion, Room 401, 1301 Morris Park Avenue, Bronx, NY, 10461, USA
| | - Nivedita Ravi
- Department of Genetics, Albert Einstein College of Medicine, Price Center/Block Research Pavilion, Room 401, 1301 Morris Park Avenue, Bronx, NY, 10461, USA
| | - Alexander Maslov
- Department of Genetics, Albert Einstein College of Medicine, Price Center/Block Research Pavilion, Room 401, 1301 Morris Park Avenue, Bronx, NY, 10461, USA
| | - Wilber Quispe-Tintaya
- Department of Genetics, Albert Einstein College of Medicine, Price Center/Block Research Pavilion, Room 401, 1301 Morris Park Avenue, Bronx, NY, 10461, USA
| | - Tao Wang
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Juan Lin
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Harriet O Smith
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology and Women's Health, Montefiore Medical Center, Bronx, NY, 10461, USA
| | - Gary L Goldberg
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology and Women's Health, Montefiore Medical Center, Bronx, NY, 10461, USA
- Department of Obstetrics and Gynecology, Northwell Health, LIJ Medical Center, New Hyde Park, NY, 11040, USA
| | - Dennis Y S Kuo
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology and Women's Health, Montefiore Medical Center, Bronx, NY, 10461, USA
| | - Cristina Montagna
- Department of Genetics, Albert Einstein College of Medicine, Price Center/Block Research Pavilion, Room 401, 1301 Morris Park Avenue, Bronx, NY, 10461, USA.
| |
Collapse
|
11
|
Pagel KA, Antaki D, Lian A, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome. PLoS Comput Biol 2019; 15:e1007112. [PMID: 31199787 PMCID: PMC6594643 DOI: 10.1371/journal.pcbi.1007112] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 06/26/2019] [Accepted: 05/17/2019] [Indexed: 11/19/2022] Open
Abstract
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.
Collapse
Affiliation(s)
- Kymberleigh A. Pagel
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
| | - Danny Antaki
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - AoJie Lian
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Lilia M. Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| | - Predrag Radivojac
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|