1
|
Vaisband M, Schubert M, Gassner FJ, Geisberger R, Greil R, Zaborsky N, Hasenauer J. Validation of genetic variants from NGS data using deep convolutional neural networks. BMC Bioinformatics 2023; 24:158. [PMID: 37081386 PMCID: PMC10116675 DOI: 10.1186/s12859-023-05255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Accurate somatic variant calling from next-generation sequencing data is one most important tasks in personalised cancer therapy. The sophistication of the available technologies is ever-increasing, yet, manual candidate refinement is still a necessary step in state-of-the-art processing pipelines. This limits reproducibility and introduces a bottleneck with respect to scalability. We demonstrate that the validation of genetic variants can be improved using a machine learning approach resting on a Convolutional Neural Network, trained using existing human annotation. In contrast to existing approaches, we introduce a way in which contextual data from sequencing tracks can be included into the automated assessment. A rigorous evaluation shows that the resulting model is robust and performs on par with trained researchers following published standard operating procedure.
Collapse
Affiliation(s)
- Marc Vaisband
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria.
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany.
| | - Maria Schubert
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Franz Josef Gassner
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Roland Geisberger
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Richard Greil
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Nadja Zaborsky
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| |
Collapse
|
2
|
Rockweiler NB, Ramu A, Nagirnaja L, Wong WH, Noordam MJ, Drubin CW, Huang N, Miller B, Todres EZ, Vigh-Conrad KA, Zito A, Small KS, Ardlie KG, Cohen BA, Conrad DF. The origins and functional effects of postzygotic mutations throughout the human life span. Science 2023; 380:eabn7113. [PMID: 37053313 PMCID: PMC11246725 DOI: 10.1126/science.abn7113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 03/17/2023] [Indexed: 04/15/2023]
Abstract
Postzygotic mutations (PZMs) begin to accrue in the human genome immediately after fertilization, but how and when PZMs affect development and lifetime health remain unclear. To study the origins and functional consequences of PZMs, we generated a multitissue atlas of PZMs spanning 54 tissue and cell types from 948 donors. Nearly half the variation in mutation burden among tissue samples can be explained by measured technical and biological effects, and 9% can be attributed to donor-specific effects. Through phylogenetic reconstruction of PZMs, we found that their type and predicted functional impact vary during prenatal development, across tissues, and through the germ cell life cycle. Thus, methods for interpreting effects across the body and the life span are needed to fully understand the consequences of genetic variants.
Collapse
Affiliation(s)
- Nicole B Rockweiler
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Avinash Ramu
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Liina Nagirnaja
- Division of Genetics, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, OR 97006, USA
| | - Wing H Wong
- Department of Pediatrics, Division of Hematology and Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Michiel J Noordam
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Casey W Drubin
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Ni Huang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, OR 97006, USA
| | - Ellen Z Todres
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Katinka A Vigh-Conrad
- Division of Genetics, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, OR 97006, USA
| | - Antonino Zito
- Department of Twin Research and Genetic Epidemiology, King's College London, London SE1 7EH, UK
| | - Kerrin S Small
- Department of Twin Research and Genetic Epidemiology, King's College London, London SE1 7EH, UK
| | | | - Barak A Cohen
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Donald F Conrad
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Genetics, Oregon National Primate Research Center, Oregon Health & Science University, Beaverton, OR 97006, USA
- Center for Embryonic Cell and Gene Therapy, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
3
|
Chang TC, Xu K, Cheng Z, Wu G. Somatic and Germline Variant Calling from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:37-54. [DOI: 10.1007/978-3-030-91836-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
4
|
A Retrospective Statistical Validation Approach for Panel of Normal-Based Single-Nucleotide Variant Detection in Tumor Sequencing. J Mol Diagn 2022; 24:41-47. [PMID: 34974877 DOI: 10.1016/j.jmoldx.2021.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 08/28/2021] [Accepted: 09/28/2021] [Indexed: 11/22/2022] Open
Abstract
An important step of somatic variant calling algorithms for deep sequencing data is quantifying the errors. For targeted sequencing in which hotspot mutations are of interest, site-specific error estimation allows more accurate calling. The site-specific error rates are often estimated from a panel of normal samples, which has limited size and is subject to sampling bias and variance. We propose a novel statistical validation method for single-nucleotide variation (SNV) calling based on historical data. The validation method extracts the high-quality reads from the Binary Alignment/Map (BAM) files, finds the negative samples in the data, and builds a statistical model to call individual samples. It is particularly useful in detecting low-frequency variants that may be missed by traditional panel of normal-based SNV methods. The proposed method makes it possible to launch a simple and parallel validation pipeline for SNV calling and improve the detection limit.
Collapse
|
5
|
Tellaetxe-Abete M, Calvo B, Lawrie C. Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data. NAR Genom Bioinform 2021; 3:lqab092. [PMID: 34729472 PMCID: PMC8557387 DOI: 10.1093/nargab/lqab092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 09/14/2021] [Accepted: 09/29/2021] [Indexed: 12/16/2022] Open
Abstract
Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1 600 000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting) and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.
Collapse
Affiliation(s)
| | - Borja Calvo
- Intelligent Systems Group, Computer Science Faculty, University of the Basque Country, Paseo Manuel Lardizabal, 20018 Donostia/San Sebastian, Spain
| | - Charles Lawrie
- Correspondence may also be addressed to Charles Lawrie. Tel: +34 943 006138;
| |
Collapse
|
6
|
Zhao X, Hu AC, Wang S, Wang X. Calling small variants using universality with Bayes-factor-adjusted odds ratios. Brief Bioinform 2021; 23:6427501. [PMID: 34791010 DOI: 10.1093/bib/bbab458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/26/2021] [Accepted: 10/07/2021] [Indexed: 11/12/2022] Open
Abstract
The application of next-generation sequencing in research and particularly in clinical routine requires highly accurate variant calling. Here we describe UVC, a method for calling small variants of germline or somatic origin. By unifying opposite assumptions with sublation, we discovered the following two empirical laws to improve variant calling: allele fraction at high sequencing depth is inversely proportional to the cubic root of variant-calling error rate, and odds ratios adjusted with Bayes factors can model various sequencing biases. UVC outperformed other variant callers on the GIAB germline truth sets, 192 scenarios of in silico mixtures simulating 192 combinations of tumor/normal sequencing depths and tumor/normal purities, the GIAB somatic truth sets derived from physical mixture, and the SEQC2 somatic reference sets derived from the breast-cancer cell-line HCC1395. UVC achieved 100% concordance with the manual review conducted by multiple independent researchers on a Qiagen 71-gene-panel dataset derived from 16 patients with colon adenoma. UVC outperformed other unique molecular identifier (UMI)-aware variant callers on the datasets used for publishing these variant callers. Performance was measured with sensitivity-specificity trade off for called variants. The improved variant calls generated by UVC from previously published UMI-based sequencing data provided additional insight about DNA damage repair. UVC is open-sourced under the BSD 3-Clause license at https://github.com/genetronhealth/uvc and quay.io/genetronhealth/gcc-6-3-0-uvc-0-6-0-441a694.
Collapse
Affiliation(s)
- Xiaofei Zhao
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Allison C Hu
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Sizhen Wang
- Genetron Health (Beijing) Co. Ltd, Beijing 102208, China
| | - Xiaoyue Wang
- State Key Laboratory of Medical Molecular Biology, Center for Bioinformatics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing 100005, China
| |
Collapse
|
7
|
Diossy M, Sztupinszki Z, Krzystanek M, Borcsok J, Eklund AC, Csabai I, Pedersen AG, Szallasi Z. Strand Orientation Bias Detector to determine the probability of FFPE sequencing artifacts. Brief Bioinform 2021; 22:6278604. [PMID: 34015811 DOI: 10.1093/bib/bbab186] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 03/11/2021] [Accepted: 04/22/2021] [Indexed: 12/20/2022] Open
Abstract
Formalin-fixed paraffin-embedded tissue, the most common tissue specimen stored in clinical practice, presents challenges in the analysis due to formalin-induced artifacts. Here, we present Strand Orientation Bias Detector (SOBDetector), a flexible computational platform compatible with all the common somatic SNV-calling pipelines, designed to assess the probability whether a given detected mutation is an artifact. The underlying predictor mechanism is based on the posterior distribution of a Bayesian logistic regression model trained on The Cancer Genome Atlas whole exomes. SOBDetector is a freely available cross-platform program, implemented in Java 1.8.
Collapse
Affiliation(s)
| | | | | | - Judit Borcsok
- University of Copenhagen and at the Danish Cancer Society, Copenhagen, Denmark
| | | | - István Csabai
- Department of Complex Physics, Eotvos Lorand University, Budapest, Hungary
| | | | - Zoltan Szallasi
- Boston Children's Hospital and Harvard Medical School, Boston, USA
| |
Collapse
|
8
|
Dudley JN, Hong CS, Hawari MA, Shwetar J, Sapp JC, Lack J, Shiferaw H, Johnston JJ, Biesecker LG. Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach. BMC Bioinformatics 2021; 22:181. [PMID: 33832433 PMCID: PMC8028235 DOI: 10.1186/s12859-021-04090-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 03/18/2021] [Indexed: 11/19/2022] Open
Abstract
Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01–0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals. Conclusion PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04090-y.
Collapse
Affiliation(s)
- Jeffrey N Dudley
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| | - Celine S Hong
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA.
| | - Marwan A Hawari
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| | - Jasmine Shwetar
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| | - Julie C Sapp
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| | - Justin Lack
- NIAID Collaborative Bioinformatics Resource, National Institutes of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA.,Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Henoke Shiferaw
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| | | | - Jennifer J Johnston
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| | - Leslie G Biesecker
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive Room 5140, Bethesda, MD, 20892, USA
| |
Collapse
|
9
|
Carrot-Zhang J, Soca-Chafre G, Patterson N, Thorner AR, Nag A, Watson J, Genovese G, Rodriguez J, Gelbard MK, Corrales-Rodriguez L, Mitsuishi Y, Ha G, Campbell JD, Oxnard GR, Arrieta O, Cardona AF, Gusev A, Meyerson M. Genetic Ancestry Contributes to Somatic Mutations in Lung Cancers from Admixed Latin American Populations. Cancer Discov 2021; 11:591-598. [PMID: 33268447 PMCID: PMC7933062 DOI: 10.1158/2159-8290.cd-20-1165] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 10/26/2020] [Accepted: 11/19/2020] [Indexed: 12/24/2022]
Abstract
Inherited lung cancer risk, particularly in nonsmokers, is poorly understood. Genomic and ancestry analysis of 1,153 lung cancers from Latin America revealed striking associations between Native American ancestry and their somatic landscape, including tumor mutational burden, and specific driver mutations in EGFR, KRAS, and STK11. A local Native American ancestry risk score was more strongly correlated with EGFR mutation frequency compared with global ancestry correlation, suggesting that germline genetics (rather than environmental exposure) underlie these disparities. SIGNIFICANCE: The frequency of somatic EGFR and KRAS mutations in lung cancer varies by ethnicity, but we do not understand why. Our study suggests that the variation in EGFR and KRAS mutation frequency is associated with genetic ancestry and suggests further studies to identify germline alleles that underpin this association.See related commentary by Gomez et al., p. 534.This article is highlighted in the In This Issue feature, p. 521.
Collapse
Affiliation(s)
- Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
- Departments of Genetics and Medicine, Harvard Medical School, Boston, Massachusetts
| | - Giovanny Soca-Chafre
- Personalized Medicine Laboratory, Instituto Nacional de Cancerologia, México City, México
| | - Nick Patterson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
- Departments of Genetics and Medicine, Harvard Medical School, Boston, Massachusetts
| | - Aaron R Thorner
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Anwesha Nag
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Jacqueline Watson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Giulio Genovese
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
- Departments of Genetics and Medicine, Harvard Medical School, Boston, Massachusetts
| | - July Rodriguez
- Foundation for Clinical and Applied Cancer Research - FICMAC, Bogotá, Colombia
| | - Maya K Gelbard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Luis Corrales-Rodriguez
- Medical Oncology, Hospital San Juan de Dios, San José, Costa Rica
- Centro de Investigación y Manejo del Cáncer - CIMCA, San José, Costa Rica
| | - Yoichiro Mitsuishi
- Division of Respiratory Medicine, Graduate School of Medicine, Juntendo University, Bunkyo-ku, Tokyo, Japan
| | - Gavin Ha
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Joshua D Campbell
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, Massachusetts
| | - Geoffrey R Oxnard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Oscar Arrieta
- Personalized Medicine Laboratory, Instituto Nacional de Cancerologia, México City, México.
- Thoracic Oncology Unit, Instituto Nacional de Cancerología, México City, México
| | - Andres F Cardona
- Foundation for Clinical and Applied Cancer Research - FICMAC, Bogotá, Colombia.
- Clinical and Translational Oncology Group, Clínica del Country, Bogotá, Colombia
| | - Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
- Division of Genetics, Brigham and Women's Hospital, Boston, Massachusetts
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts
- Departments of Genetics and Medicine, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
10
|
Carrot-Zhang J, Yao X, Devarakonda S, Deshpande A, Damrauer JS, Silva TC, Wong CK, Choi HY, Felau I, Robertson AG, Castro MAA, Bao L, Rheinbay E, Liu EM, Trieu T, Haan D, Yau C, Hinoue T, Liu Y, Shapira O, Kumar K, Mungall KL, Zhang H, Lee JJK, Berger A, Gao GF, Zhitomirsky B, Liang WW, Zhou M, Moorthi S, Berger AH, Collisson EA, Zody MC, Ding L, Cherniack AD, Getz G, Elemento O, Benz CC, Stuart J, Zenklusen JC, Beroukhim R, Chang JC, Campbell JD, Hayes DN, Yang L, Laird PW, Weinstein JN, Kwiatkowski DJ, Tsao MS, Travis WD, Khurana E, Berman BP, Hoadley KA, Robine N, Meyerson M, Govindan R, Imielinski M. Whole-genome characterization of lung adenocarcinomas lacking the RTK/RAS/RAF pathway. Cell Rep 2021; 34:108707. [PMID: 33535033 PMCID: PMC8009291 DOI: 10.1016/j.celrep.2021.108707] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 09/08/2020] [Accepted: 01/08/2021] [Indexed: 12/13/2022] Open
Abstract
RTK/RAS/RAF pathway alterations (RPAs) are a hallmark of lung adenocarcinoma (LUAD). In this study, we use whole-genome sequencing (WGS) of 85 cases found to be RPA(-) by previous studies from The Cancer Genome Atlas (TCGA) to characterize the minority of LUADs lacking apparent alterations in this pathway. We show that WGS analysis uncovers RPA(+) in 28 (33%) of the 85 samples. Among the remaining 57 cases, we observe focal deletions targeting the promoter or transcription start site of STK11 (n = 7) or KEAP1 (n = 3), and promoter mutations associated with the increased expression of ILF2 (n = 6). We also identify complex structural variations associated with high-level copy number amplifications. Moreover, an enrichment of focal deletions is found in TP53 mutant cases. Our results indicate that RPA(-) cases demonstrate tumor suppressor deletions and genome instability, but lack unique or recurrent genetic lesions compensating for the lack of RPAs. Larger WGS studies of RPA(-) cases are required to understand this important LUAD subset.
Collapse
Affiliation(s)
- Jian Carrot-Zhang
- Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Xiaotong Yao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA; New York Genome Center, New York, NY, USA; Tri-institutional Ph.D. Program in Computational Biology and Medicine, New York, NY, USA; Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Siddhartha Devarakonda
- Section of Medical Oncology, Division of Oncology, Washington University School of Medicine, St. Louis, MO, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Aditya Deshpande
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA; New York Genome Center, New York, NY, USA; Tri-institutional Ph.D. Program in Computational Biology and Medicine, New York, NY, USA; Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Jeffrey S Damrauer
- Department of Genetics, Computational Medicine Program, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Tiago Chedraoui Silva
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Christopher K Wong
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hyo Young Choi
- University of Tennessee Health Science Center, UTHSC Center for Cancer Research, TN, USA
| | - Ina Felau
- National Cancer Institute, Bethesda, MD, USA
| | - A Gordon Robertson
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Mauro A A Castro
- Bioinformatics and Systems Biology Laboratory, Federal University of Paraná, Curitiba, PR, Brazil
| | - Lisui Bao
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
| | - Esther Rheinbay
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Eric Minwei Liu
- Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Tuan Trieu
- Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - David Haan
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Christina Yau
- University of California, San Francisco, San Francisco, CA, USA; Buck Institute for Research on Aging, Novato, CA, USA
| | | | - Yuexin Liu
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ofer Shapira
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kiran Kumar
- Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Karen L Mungall
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Hailei Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Ashton Berger
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Galen F Gao
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Binyamin Zhitomirsky
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Wen-Wei Liang
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA
| | - Meng Zhou
- Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | | | - Alice H Berger
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | | | - Li Ding
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA
| | - Andrew D Cherniack
- Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Olivier Elemento
- Tri-institutional Ph.D. Program in Computational Biology and Medicine, New York, NY, USA; Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | | | - Josh Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Rameen Beroukhim
- Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Jason C Chang
- Thoracic Pathology, Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Joshua D Campbell
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - D Neil Hayes
- University of Tennessee Health Science Center, UTHSC Center for Cancer Research, TN, USA
| | - Lixing Yang
- Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
| | | | - John N Weinstein
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Ming S Tsao
- Department of Pathology, University Health Network, Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - William D Travis
- Thoracic Pathology, Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ekta Khurana
- Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Benjamin P Berman
- Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University, Jerusalem, Israel
| | - Katherine A Hoadley
- Department of Genetics, Computational Medicine Program, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | - Matthew Meyerson
- Dana-Farber Cancer Institute, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Ramaswamy Govindan
- Section of Medical Oncology, Division of Oncology, Washington University School of Medicine, St. Louis, MO, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA.
| | - Marcin Imielinski
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA; New York Genome Center, New York, NY, USA; Caryl and Israel Englander Institute for Precision Medicine and Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
11
|
Casiraghi N, Orlando F, Ciani Y, Xiang J, Sboner A, Elemento O, Attard G, Beltran H, Demichelis F, Romanel A. ABEMUS: platform-specific and data-informed detection of somatic SNVs in cfDNA. Bioinformatics 2020; 36:2665-2674. [PMID: 31922552 PMCID: PMC7203757 DOI: 10.1093/bioinformatics/btaa016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 12/04/2019] [Accepted: 01/07/2020] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The use of liquid biopsies for cancer patients enables the non-invasive tracking of treatment response and tumor dynamics through single or serial blood drawn tests. Next-generation sequencing assays allow for the simultaneous interrogation of extended sets of somatic single-nucleotide variants (SNVs) in circulating cell-free DNA (cfDNA), a mixture of DNA molecules originating both from normal and tumor tissue cells. However, low circulating tumor DNA (ctDNA) fractions together with sequencing background noise and potential tumor heterogeneity challenge the ability to confidently call SNVs. RESULTS We present a computational methodology, called Adaptive Base Error Model in Ultra-deep Sequencing data (ABEMUS), which combines platform-specific genetic knowledge and empirical signal to readily detect and quantify somatic SNVs in cfDNA. We tested the capability of our method to analyze data generated using different platforms with distinct sequencing error properties and we compared ABEMUS performances with other popular SNV callers on both synthetic and real cancer patients sequencing data. Results show that ABEMUS performs better in most of the tested conditions proving its reliability in calling low variant allele frequencies somatic SNVs in low ctDNA levels plasma samples. AVAILABILITY AND IMPLEMENTATION ABEMUS is cross-platform and can be installed as R package. The source code is maintained on Github at http://github.com/cibiobcg/abemus, and it is also available at CRAN official R repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicola Casiraghi
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Francesco Orlando
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Yari Ciani
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Jenny Xiang
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine
- Genomics and Epigenomics Core Facility
| | - Andrea Sboner
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Olivier Elemento
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Gerhardt Attard
- UCL Cancer Institute, University College London, London WC1E 6BT, UK
| | - Himisha Beltran
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine
- Department of Medical Oncology, Dana Farber Cancer Institute, Boston, MA 02215, USA
- Department of Medicine, Division of Hematology and Medical Oncology, Weill Cornell Medicine, New York, NY 10021, USA
| | - Francesca Demichelis
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
- Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Alessandro Romanel
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| |
Collapse
|
12
|
Mannakee BK, Gutenkunst RN. BATCAVE: calling somatic mutations with a tumor- and site-specific prior. NAR Genom Bioinform 2020; 2:lqaa004. [PMID: 32051931 PMCID: PMC7003682 DOI: 10.1093/nargab/lqaa004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/13/2020] [Accepted: 01/23/2020] [Indexed: 02/06/2023] Open
Abstract
Detecting somatic mutations withins tumors is key to understanding treatment resistance, patient prognosis and tumor evolution. Mutations at low allelic frequency, those present in only a small portion of tumor cells, are particularly difficult to detect. Many algorithms have been developed to detect such mutations, but none models a key aspect of tumor biology. Namely, every tumor has its own profile of mutation types that it tends to generate. We present BATCAVE (Bayesian Analysis Tools for Context-Aware Variant Evaluation), an algorithm that first learns the individual tumor mutational profile and mutation rate then uses them in a prior for evaluating potential mutations. We also present an R implementation of the algorithm, built on the popular caller MuTect. Using simulations, we show that adding the BATCAVE algorithm to MuTect improves variant detection. It also improves the calibration of posterior probabilities, enabling more principled tradeoff between precision and recall. We also show that BATCAVE performs well on real data. Our implementation is computationally inexpensive and straightforward to incorporate into existing MuTect pipelines. More broadly, the algorithm can be added to other variant callers, and it can be extended to include additional biological features that affect mutation generation.
Collapse
Affiliation(s)
- Brian K Mannakee
- Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85721, USA
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
13
|
Xu C, Gu X, Padmanabhan R, Wu Z, Peng Q, DiCarlo J, Wang Y. smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers. Bioinformatics 2020; 35:1299-1309. [PMID: 30192920 PMCID: PMC6477992 DOI: 10.1093/bioinformatics/bty790] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 08/03/2018] [Accepted: 09/05/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2's superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Xiujing Gu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | | | - Zhong Wu
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Quan Peng
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - John DiCarlo
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| | - Yexun Wang
- Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA
| |
Collapse
|
14
|
Bartha Á, Győrffy B. Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. Cancers (Basel) 2019; 11:E1725. [PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.
Collapse
Affiliation(s)
- Áron Bartha
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| | - Balázs Győrffy
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| |
Collapse
|
15
|
Ferretti L, Tennakoon C, Silesian A, Ribeca GFA. SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data. Genes (Basel) 2019; 10:E561. [PMID: 31349684 PMCID: PMC6722845 DOI: 10.3390/genes10080561] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 07/08/2019] [Accepted: 07/16/2019] [Indexed: 11/28/2022] Open
Abstract
Current high-throughput sequencing technologies can generate sequence data and provide information on the genetic composition of samples at very high coverage. Deep sequencing approaches enable the detection of rare variants in heterogeneous samples, such as viral quasi-species, but also have the undesired effect of amplifying sequencing errors and artefacts. Distinguishing real variants from such noise is not straightforward. Variant callers that can handle pooled samples can be in trouble at extremely high read depths, while at lower depths sensitivity is often sacrificed to specificity. In this paper, we propose SiNPle (Simplified Inference of Novel Polymorphisms from Large coveragE), a fast and effective software for variant calling. SiNPle is based on a simplified Bayesian approach to compute the posterior probability that a variant is not generated by sequencing errors or PCR artefacts. The Bayesian model takes into consideration individual base qualities as well as their distribution, the baseline error rates during both the sequencing and the PCR stage, the prior distribution of variant frequencies and their strandedness. Our approach leads to an approximate but extremely fast computation of posterior probabilities even for very high coverage data, since the expression for the posterior distribution is a simple analytical formula in terms of summary statistics for the variants appearing at each site in the genome. These statistics can be used to filter out putative SNPs and indels according to the required level of sensitivity. We tested SiNPle on several simulated and real-life viral datasets to show that it is faster and more sensitive than existing methods. The source code for SiNPle is freely available to download and compile, or as a Conda/Bioconda package.
Collapse
Affiliation(s)
- Luca Ferretti
- Integrative Biology and Bioinformatics, The Pirbright Institute, Woking GU24 0NF, UK
| | - Chandana Tennakoon
- Integrative Biology and Bioinformatics, The Pirbright Institute, Woking GU24 0NF, UK
| | - Adrian Silesian
- Integrative Biology and Bioinformatics, The Pirbright Institute, Woking GU24 0NF, UK
| | | |
Collapse
|
16
|
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables. Comput Struct Biotechnol J 2019; 17:561-569. [PMID: 31049166 PMCID: PMC6482431 DOI: 10.1016/j.csbj.2019.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/12/2019] [Accepted: 04/03/2019] [Indexed: 01/10/2023] Open
Abstract
Deep sequencing genomic analysis is becoming increasingly common in clinical research and practice, enabling accurate identification of diagnostic, prognostic, and predictive determinants. Variant calling, distinguishing between true mutations and experimental errors, is a central task of genomic analysis and often requires sophisticated statistical, computational, and/or heuristic techniques. Although variant callers seek to overcome noise inherent in biological experiments, variant calling can be significantly affected by outside factors including those used to prepare, store, and analyze samples. The goal of this review is to discuss known experimental features, such as sample preparation, library preparation, and sequencing, alongside diverse biological and clinical variables, and evaluate their effect on variant caller selection and optimization.
Collapse
|
17
|
Viswanathan SR, Ha G, Hoff AM, Wala JA, Carrot-Zhang J, Whelan CW, Haradhvala NJ, Freeman SS, Reed SC, Rhoades J, Polak P, Cipicchio M, Wankowicz SA, Wong A, Kamath T, Zhang Z, Gydush GJ, Rotem D, Love JC, Getz G, Gabriel S, Zhang CZ, Dehm SM, Nelson PS, Van Allen EM, Choudhury AD, Adalsteinsson VA, Beroukhim R, Taplin ME, Meyerson M. Structural Alterations Driving Castration-Resistant Prostate Cancer Revealed by Linked-Read Genome Sequencing. Cell 2018; 174:433-447.e19. [PMID: 29909985 PMCID: PMC6046279 DOI: 10.1016/j.cell.2018.05.036] [Citation(s) in RCA: 235] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 03/09/2018] [Accepted: 05/16/2018] [Indexed: 01/17/2023]
Abstract
Nearly all prostate cancer deaths are from metastatic castration-resistant prostate cancer (mCRPC), but there have been few whole-genome sequencing (WGS) studies of this disease state. We performed linked-read WGS on 23 mCRPC biopsy specimens and analyzed cell-free DNA sequencing data from 86 patients with mCRPC. In addition to frequent rearrangements affecting known prostate cancer genes, we observed complex rearrangements of the AR locus in most cases. Unexpectedly, these rearrangements include highly recurrent tandem duplications involving an upstream enhancer of AR in 70%-87% of cases compared with <2% of primary prostate cancers. A subset of cases displayed AR or MYC enhancer duplication in the context of a genome-wide tandem duplicator phenotype associated with CDK12 inactivation. Our findings highlight the complex genomic structure of mCRPC, nominate alterations that may inform prostate cancer treatment, and suggest that additional recurrent events in the non-coding mCRPC genome remain to be discovered.
Collapse
Affiliation(s)
- Srinivas R Viswanathan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Gavin Ha
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Andreas M Hoff
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Jeremiah A Wala
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Christopher W Whelan
- Harvard Medical School, Boston, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas J Haradhvala
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Samuel S Freeman
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Sarah C Reed
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Justin Rhoades
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Paz Polak
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Stephanie A Wankowicz
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia Wong
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Tushar Kamath
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Zhenwei Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gregory J Gydush
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Denisse Rotem
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - J Christopher Love
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Koch Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gad Getz
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Stacey Gabriel
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Cheng-Zhong Zhang
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Cambridge, MA, USA
| | - Scott M Dehm
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN, USA
| | - Peter S Nelson
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Eliezer M Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Atish D Choudhury
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Viktor A Adalsteinsson
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Koch Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Brigham and Women's Hospital, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Mary-Ellen Taplin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA; Brigham and Women's Hospital, Boston, MA, USA.
| |
Collapse
|
18
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|