1
|
Abstract 215: A novel enzymatic library preparation workflow that dramatically reduces artifacts associated with damaged FFPE samples. Cancer Res 2023. [DOI: 10.1158/1538-7445.am2023-215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
Abstract
Abstract
Clinical oncology heavily relies on formalin-fixed, paraffin-embedded (FFPE) tissue samples for histology and molecular characterization. The chemical and physical modifications of nucleic acids introduced during fixation, storage and purification negatively impact molecular profiling and vary from sample-to-sample. Conventional sonication and ligation-based library preparation is considered the gold-standard approach for FFPE samples, but it is time-consuming and expensive. Importantly, these processes introduce artifacts that impact downstream analysis and interpretation (Haile et al., 2019).
We have developed a novel fragmentation chemistry that virtually eliminates hairpin artifacts, achieving levels on par with non-FFPE control samples. Our fragmentation method is highly scalable, exhibits minimal sequence bias, and reduces cost and workflow inefficiencies associated with mechanical shearing. In this study, we developed a unified method for library preparation from FFPE samples which produces similar insert sizes across variable input mass and quality of FFPE samples. We carefully evaluated the performance of this method relative to a sonication-based approach employing the KAPA HyperPrep kit. Libraries were constructed from 50 to 200 ng of FFPE DNA, inputs typically used for NGS, ranging from low-to high-quality as assessed using a qPCR-based method and DNA integrity (DIN). Targeted sequencing was performed using a 37 kb custom oncology hybridization capture panel to investigate molecular complexity.
Our workflow virtually eliminated hairpin artifacts that were present in up to 4.5% of reads in sonication-based libraries. Soft-clipping was also 3- to 7-fold lower in libraries prepared with the Watchmaker kit relative to sonicated DNA libraries, improving overall sequencing economy. Furthermore, the mean target coverage achieved with the Watchmaker kit was comparable to or higher than sonication libraries using the same input mass. Because input masses were normalized post-shearing, which typically results in 20-40% sample loss, coverage with our approach is significantly higher relative to sonication, if normalizing to pre-sheared DNA input.
Watchmaker DNA Library Preparation with Fragmentation enables high-quality DNA library preparation from damaged FFPE samples, producing high target coverage, uniform insert size, and minimizing sequencing artifacts to improve sensitivity and specificity. This approach is highly scalable and automatable, enabling various oncology applications.
Citation Format: Giulia Corbet, Philip Benson, Kailee Reed, Skyler Mishkin, Thomas Harrison, Kristin Scott, Zane Jaafar, Kristina Giorda, Josh Haimes, Martin Ranik, Brian Kudlow. A novel enzymatic library preparation workflow that dramatically reduces artifacts associated with damaged FFPE samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 215.
Collapse
|
2
|
Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology. Nat Biotechnol 2021; 39:1115-1128. [PMID: 33846644 PMCID: PMC8434938 DOI: 10.1038/s41587-021-00857-z] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 02/15/2021] [Indexed: 02/08/2023]
Abstract
Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.
Collapse
|
3
|
Cross-oncopanel study reveals high sensitivity and accuracy with overall analytical performance depending on genomic regions. Genome Biol 2021; 22:109. [PMID: 33863344 PMCID: PMC8051090 DOI: 10.1186/s13059-021-02315-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 03/18/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. RESULTS All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. CONCLUSION This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
Collapse
|
4
|
A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency. Genome Biol 2021; 22:111. [PMID: 33863366 PMCID: PMC8051128 DOI: 10.1186/s13059-021-02316-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 03/18/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Collapse
|
5
|
De novo assembly of the olive fruit fly (Bactrocera oleae) genome with linked-reads and long-read technologies minimizes gaps and provides exceptional Y chromosome assembly. BMC Genomics 2020; 21:259. [PMID: 32228451 PMCID: PMC7106766 DOI: 10.1186/s12864-020-6672-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/13/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The olive fruit fly, Bactrocera oleae, is the most important pest in the olive fruit agribusiness industry. This is because female flies lay their eggs in the unripe fruits and upon hatching the larvae feed on the fruits thus destroying them. The lack of a high-quality genome and other genomic and transcriptomic data has hindered progress in understanding the fly's biology and proposing alternative control methods to pesticide use. RESULTS Genomic DNA was sequenced from male and female Demokritos strain flies, maintained in the laboratory for over 45 years. We used short-, mate-pair-, and long-read sequencing technologies to generate a combined male-female genome assembly (GenBank accession GCA_001188975.2). Genomic DNA sequencing from male insects using 10x Genomics linked-reads technology followed by mate-pair and long-read scaffolding and gap-closing generated a highly contiguous 489 Mb genome with a scaffold N50 of 4.69 Mb and L50 of 30 scaffolds (GenBank accession GCA_001188975.4). RNA-seq data generated from 12 tissues and/or developmental stages allowed for genome annotation. Short reads from both males and females and the chromosome quotient method enabled identification of Y-chromosome scaffolds which were extensively validated by PCR. CONCLUSIONS The high-quality genome generated represents a critical tool in olive fruit fly research. We provide an extensive RNA-seq data set, and genome annotation, critical towards gaining an insight into the biology of the olive fruit fly. In addition, elucidation of Y-chromosome sequences will advance our understanding of the Y-chromosome's organization, function and evolution and is poised to provide avenues for sterile insect technique approaches.
Collapse
|
6
|
Abstract 3520: Detection of low-frequency variants in highly degraded DNA and RNA samples. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-3520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Diagnostic tools based on next generation sequencing are fundamentally transforming clinical oncology. However, there is a lack of adequate library preparation strategies for highly degraded, clinically relevant samples, such as cell-free DNA (cfDNA) and formalin-fixed paraffin-embedded (FFPE) DNA. Due to the extreme heterogeneity of these sample types, targeted sequencing is often used to achieve deep coverage of genomic loci and enable detection of low-frequency variants. Commercially available protocols for library preparation require stringent size-selection to remove adapter-dimers, which reduces library complexity and variant detection power. Achieving high specificity can be challenging because low-frequency artifacts arise from a variety of sources, including DNA extraction, library construction, PCR, hybrid selection, and sequencing. These artifacts can be identified by “duplex sequencing”, where strand-specific unique molecular identifiers (UMIs) are used to confirm the presence of an alteration on both strands of an input molecule. However, duplex sequencing typically delivers low conversion rates with degraded samples due to poor ligation efficiency and template loss during size-selection. Here, we present the IDT library preparation kit optimized for low-input and degraded samples. Our novel library construction chemistry relies on an engineered DNA ligase and proprietary duplexed sequencing adapters that prevent chimeras, suppress dimer-formation (negating the need for size-selection), and enhance variant calling sensitivity. We adapted the workflow for both DNA and RNA applications and demonstrated efficacy using diverse sample types. To assess sensitivity, we created libraries with varied inputs using mixtures of Genome in a Bottle gDNA (NA12878 and NA24385) and performed hybrid capture using a 52 kb custom panel targeting: single nucleotide variants (SNVs), copy number variants (CNVs), and gene fusions. When compared to commercially available methods, our approach yielded a 1.5- to 4-fold increase in library complexity with improved sensitivity to 0.25% variants using 1-25 ng of cfDNA, and 0.5% using 25-250 ng FFPE DNA. We also obtained 100% specificity using duplexed UMI correction, which removed all false-positive calls. RNA libraries were constructed from FFPE NGS reference standards to evaluate the detection of ALK, RET, ROS, NTRK1, and NTRK3 fusions and sequenced to an average target depth of 10,000X. Our method provides superior sensitivity and specificity for detection of low-frequency variants, even with highly degraded DNA.
Citation Format: Ariel Royall, Ushati Das Chakravarty, Katharine Dilger, Manqing Hong, Kevin Lai, Kristina Giorda, Keith Bryan, Yu Wang, Lynette Lewis, Scott Rose, Yu Zheng. Detection of low-frequency variants in highly degraded DNA and RNA samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 3520.
Collapse
|
7
|
Abstract 4348: DNA-based fusion detection using a pan-cancer tumor profiling 532-oncogene panel. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-4348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Large-scale cancer profiling using next generation sequencing (NGS) has become instrumental to the discovery and identification of new, targetable cancer alterations. A comprehensive set of 532 oncogene targets were combined to create the new xGen® Pan-Cancer Panel V2 for hybrid capture sequencing. This xGen panel covers 2.2 Mb of the human genome, and allows for the simultaneous detection of copy number variations (CNVs), insertions and deletions (indels), gene rearrangements, and microsatellite instability across a wide range of sample types, inputs, and quality. Using a new library prep workflow optimized for low quality samples and low input, panel performance was first evaluated with 30 ng of input DNA using libraries built from matched samples [formalin-fixed paraffin-embedded (FFPE) tumor gDNA, frozen adjacent normal tissue gDNA, and cell-free DNA (cfDNA)] from five lung cancer donors (n = 15). Sample quality ranged from a mean DIN of 4.4 ±1.1 to 8.3 ±0.9 for FFPE tumor gDNA and frozen normal gDNA, respectively. After subsampling to 200X mean target coverage, 96% of target bases had at least 40X coverage for all libraries. Comparative hierarchal clustering analysis was then used to identify lung cancer mutations shared in all tumor samples. NGS gDNA reference standards from Horizon Discovery (HD753, HD798, HD799, and HD803) with verified CNVs, single nucleotide variations (SNVs), amplifications, and fusions, and were used to evaluate detection rates at different library input masses down to 1 ng. cDNA libraries were also prepared from RNA extracted from FFPE 5-Fusion RNA Multiplex Reference Standards (HD796, HD783). We identified all possible gene fusion events in the positive control using the structural variant caller, LUMPY (https://github.com/arq5x/lumpy-sv). The xGen Pan-Cancer Panel V2 enables a cost-efficient and time-saving approach for the detection of multiple oncogene targets.
Citation Format: Katharine Dilger, Yongming Sun, Kevin Lai, Ushati Das Chakravarty, Nicole Sponer, Kristina Giorda, Patrick Lau, Yu Wang. DNA-based fusion detection using a pan-cancer tumor profiling 532-oncogene panel [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 4348.
Collapse
|
8
|
Abstract 418: Highly efficient duplex DNA tagging strategy improves accuracy of detecting ultra-low-frequency mutations through consensus read reconstruction. Cancer Res 2018. [DOI: 10.1158/1538-7445.am2018-418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Molecular diagnostics and precise personalized care continue to increase the sensitivity and specificity requirements for detecting low (~5%) to ultra-low (<1%) frequency actionable mutations by next generation sequencing (NGS). These low-frequency variants often occur below the reliable limit of detection of standard NGS as they are confounded by errors introduced during the NGS workflow. We have developed adapters containing unique molecular identifiers (UMIs) that permit tagging of double-stranded DNA and statistical reconstruction of reads sequenced as duplicates. These novel adapters are compatible with standard library preparation and enrichment methods and reagents, yet provide significantly enhanced error correction. Methods: Libraries were prepared from the 3 most commonly used oncology sample types: genomic, FFPE, or plasma-derived, cell-free DNA, using a standard, commercially available kit combined with standard adapters or the novel duplex UMI adapters. The libraries were enriched with a custom xGen Lockdown Panel targeting a 75 kb polymorphic region, and then deep sequencing and variant calling were performed. A consensus read building tool was developed to collapse PCR duplicates based on molecular barcodes, and the tool was used to evaluate the utility of UMIs in error correction. Results: Compared to standard adapters, the duplex adapters presented comparable or better library yield and mean deduplicated sequencing depth for all sample types and input masses tested. To evaluate variant-calling accuracy, we established mixtures of DNA of known SNP genotype to mimic ultra-low-frequency variant samples. For genomic DNA samples, DNA from NA12878 and NA24385 were mixed to generate minor allele frequency (MAF) down to 0.1%, and variant calls were evaluated against annotations in Genome in a Bottle. Commercial FFPE and cell-free DNA samples were genotyped and mixed to present MAF at 0.5%. When standard adapters were used, variants present at the MAF were detected with 90% sensitivity for all sample types, but only under conditions that also called >4,000 false positives, resulting in a positive predictive value (PPV) of <3%. Using the duplex UMI adapters in conjunction with consensus read construction, >90% detection sensitivity was achieved for genomic and cell-free DNA samples with 0 false positives, resulting in 100% PPV, a >40-fold error suppression. For FFPE DNA samples, the duplex adapters provided >30-fold PPV improvement compared to standard adapters. Conclusions: The duplex UMI adapters eliminate background NGS errors by collapsing duplicated reads using their unique tags. This leads to unprecedented accuracy in detecting true low-frequency variants, regardless of the input DNA source. Such advances are key to refining diagnostics and improving precision cancer care.
Citation Format: Jiashi Wang, Kevin Lai, Madelyn Light, Layla Katiraee, Kristina Giorda, Mirna Jarosz, Yun Bao, Criss Walworth, David Kupec, Caifu Chen. Highly efficient duplex DNA tagging strategy improves accuracy of detecting ultra-low-frequency mutations through consensus read reconstruction [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 418.
Collapse
|
9
|
Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing. BMC Genomics 2018; 19:30. [PMID: 29310587 PMCID: PMC5759201 DOI: 10.1186/s12864-017-4428-5] [Citation(s) in RCA: 114] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 12/29/2017] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Sample index cross-talk can result in false positive calls when massively parallel sequencing (MPS) is used for sensitive applications such as low-frequency somatic variant discovery, ancient DNA investigations, microbial detection in human samples, or circulating cell-free tumor DNA (ctDNA) variant detection. Therefore, the limit-of-detection of an MPS assay is directly related to the degree of index cross-talk. RESULTS Cross-talk rates up to 0.29% were observed when using standard, combinatorial adapters, resulting in 110,180 (0.1% cross-talk rate) or 1,121,074 (0.29% cross-talk rate) misassigned reads per lane in non-patterned and patterned Illumina flow cells, respectively. Here, we demonstrate that using unique, dual-matched indexed adapters dramatically reduces index cross-talk to ≤1 misassigned reads per flow cell lane. While the current study was performed using dual-matched indices, using unique, dual-unrelated indices would also be an effective alternative. CONCLUSIONS For sensitive downstream analyses, the use of combinatorial indices for multiplexed hybrid capture and sequencing is inappropriate, as it results in an unacceptable number of misassigned reads. Cross-talk can be virtually eliminated using dual-matched indexed adapters. These results suggest that use of such adapters is critical to reduce false positive rates in assays that aim to identify low allele frequency events, and strongly indicate that dual-matched adapters be implemented for all sensitive MPS applications.
Collapse
|
10
|
Abstract 397: Whole genome copy number variation analysis using a SNP-focused targeted sequencing panel for tumor analysis. Cancer Res 2017. [DOI: 10.1158/1538-7445.am2017-397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Accurate genome-wide copy number variation (CNV) analysis is critical for disease and cancer research. Current approaches for CNV analysis include fluorescence in situ hybridization (FISH), array comparative genomic hybridization (array CGH), and SNP arrays. Unfortunately, these methods are not sensitive enough for real world cancer samples because of tumor ploidy, purity and heterogeneity. NGS-based targeted sequencing is increasingly being used for CNV analysis due to throughput, coverage, cost, and sample input requirements. For CNV analysis, detection power is improved by combining both read depth and SNP allele frequency analysis, particularly for copy-neutral events such as loss of heterozygosity. A custom xGen Lockdown CNV backbone panel was developed for broad, uniform genome coverage and to enrich for population-based SNPs. We demonstrate use of the panel as an addition to the xGen Exome Research Panel and a custom cancer focused panel. Downstream analysis incorporates both read depth and observed minor allele frequencies to determine CNVs with enhanced sensitivity. To increase the resolution for large-scale alterations of chromosome 7, a hot-spot for disease-associated CNVs, probe density was increased 6 fold. A known standard, NA12878, was used to validate the panel’s ability to detect heterozygous SNPs with high confidence. In addition, mixtures of cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE) were tested with varying levels of background copy-neutral genomic DNA. The sensitivity and specificity of the panel to detect CNV and LOH events with was assessed using deep exome and Affymetrix SNP array data. The ability to detect copy number alterations with high resolution and accuracy would be a valuable resource for disease and cancer research.
Citation Format: Jiashi Wang, Kristina Giorda, Zhongwu Lai, Daniel Stetson, Mirna Jarosz. Whole genome copy number variation analysis using a SNP-focused targeted sequencing panel for tumor analysis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 397. doi:10.1158/1538-7445.AM2017-397
Collapse
|
11
|
Assessing the suitability of NGS panels for clinical sequencing. MLO: MEDICAL LABORATORY OBSERVER 2017; 49:36-37. [PMID: 29979010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
|
12
|
Abstract 3602: Linked-Reads enable detailed, phased resolution of structural variation in the cancer genome. Cancer Res 2016. [DOI: 10.1158/1538-7445.am2016-3602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Studies have shown that somatic structural variation (SV) plays a key role in the oncogenic process. Traditionally SVs in the cancer genome have been detected using low resolution cytogenetic approaches, such as FISH, or microarray-based techniques. More recently, next-generation sequencing (NGS)-based technologies have been employed to detect SVs, including indels and translocations. However, both short- and long-read NGS-based approaches are limited in their ability to accurately identify SV events and delineate their breakpoints due to the limitations inherent in assembly of billions of short-read sequences across a heterogeneous cancer sample, as well as the costly and burdensome laboratory infrastructure associated with long-read sequencers. We utilized a novel technology that combines microfluidics and molecular barcoding to generate libraries that are sequenced with an Illumina system. Open-source bioinformatics software produces linked-reads that maintain long-range information and single molecule sensitivity.
Cell lines and cancer samples were obtained from commercial sources, and genomic DNA was extracted. DNA sample indexing and partitioning was performed using the 10X Genomicx GemCode instrument. One ng of sample DNA was used as input for each reaction, and DNA molecules were partitioned into droplets to fragment the DNA and introduce molecular barcodes. Following barcoding, droplets were fractured, and library DNA was purified and sequenced on Illumina sequencers. The GemCode Long Ranger software suite was used to map sequencing reads back to original long molecules of DNA, generating reads linked to partition barcodes. Thus we can generate phased sequences covering many 10's to 100's of kilobases.
We first benchmarked the ability to call multiple SV types using a well-characterized germline HapMap sample (NA12878) as well as two recently characterized haploid hydatidiform moles (CHM1 and CHM13) that have been studied with multiple orthogonal technologies. Regions with evidence for structural variation were reassembled into distinct haplotypes. The barcode information allowed us to both phase the structural variants we detected and disambiguate calls within highly repetitive regions, such as segmental duplications. We demonstrated high concordance with alternative approaches across all major classes of SVs, including long insertions and deletions as well as copy-neutral events. In cancer cell lines, we detected well-annotated gene fusions, such as the EML4/ALK and ALK/PTPN3 fusions in the lung cancer cell line NCI-H2228, and the SLC26A/PRKAR2A fusion in the triple negative breast cancer cell line HCC38.
Citation Format: Sofia Kyriazopoulou-Panagiotopoulou, Patrick Marks, Haynes Heaton, Heather Ordonez, Kristina Giorda, Cassandra Jabara, Billy Lau, John M. Bell, Michael Schnall-Levin, Hanlee P. Ji. Linked-Reads enable detailed, phased resolution of structural variation in the cancer genome. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 3602.
Collapse
|
13
|
Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 2016; 3:160025. [PMID: 27271295 PMCID: PMC4896128 DOI: 10.1038/sdata.2016.25] [Citation(s) in RCA: 385] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 03/15/2016] [Indexed: 02/01/2023] Open
Abstract
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Collapse
|
14
|
Molecular Cytogenetics Using Linked-Reads. Cancer Genet 2016. [DOI: 10.1016/j.cancergen.2016.04.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
A hybrid approach for de novo human genome sequence assembly and phasing. Nat Methods 2016; 13:587-90. [PMID: 27159086 PMCID: PMC4927370 DOI: 10.1038/nmeth.3865] [Citation(s) in RCA: 149] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 04/08/2016] [Indexed: 12/11/2022]
Abstract
Despite tremendous progress in genome sequencing, the basic goal of producing phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe a new approach to perform de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics Linked-Read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.
Collapse
|
16
|
Health and population effects of rare gene knockouts in adult humans with related parents. Science 2016; 352:474-7. [PMID: 26940866 DOI: 10.1126/science.aac8624] [Citation(s) in RCA: 202] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 02/18/2016] [Indexed: 12/13/2022]
Abstract
Examining complete gene knockouts within a viable organism can inform on gene function. We sequenced the exomes of 3222 British adults of Pakistani heritage with high parental relatedness, discovering 1111 rare-variant homozygous genotypes with predicted loss of function (knockouts) in 781 genes. We observed 13.7% fewer homozygous knockout genotypes than we expected, implying an average load of 1.6 recessive-lethal-equivalent loss-of-function (LOF) variants per adult. When genetic data were linked to the individuals' lifelong health records, we observed no significant relationship between gene knockouts and clinical consultation or prescription rate. In this data set, we identified a healthy PRDM9-knockout mother and performed phased genome sequencing on her, her child, and control individuals. Our results show that meiotic recombination sites are localized away from PRDM9-dependent hotspots. Thus, natural LOF variants inform on essential genetic loci and demonstrate PRDM9 redundancy in humans.
Collapse
|
17
|
Abstract 4742: Using 1ng of DNA to detect haplotype phasing and gene fusions from whole exome sequencing of cancer cell lines. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-4742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
We have used a new platform from 10X Genomics to obtain ultra-deep and long-range exome sequencing data from 1ng of cancer cell line DNA. While traditional Whole Exome Sequencing (WES) methods have inputs of ∼50-500ng and lack long-range data, we demonstrate that the 10X platform can successfully obtain both gene fusion and haplotype phasing information along with standard SNP and indel variant-calling from WES on an Illumina HiSeq using just 1ng of DNA.
The 10X platform massively partitions long DNA into >100,000 individual reactions. Each partition generates sequencing libraries with a unique barcode; the integrated downstream analysis platform uses the barcodes to map Illumina's short reads back to original long DNA fragments. Because the barcoding is performed prior to exome enrichment, the long-range information conveyed by the barcodes is retained even after standard hybrid capture enrichment. Powerful algorithms are implemented using a custom informatics toolset that is self-contained and easy to install, and the resulting structural variant calls can be easily visualized and explored using 10X's haplotype-aware genome browser.
We used the 10X platform to analyze cancer cell lines with previously reported gene fusions. In all cases described, the fusion breakpoints were in introns of > 5.9kbp, and would therefore not be detectable with standard WES. In lung cancer cell line NCI-H2228, we called two annotated fusions, EML4/ALK, and ALK/PTPN3. Two fusion genes (MCHR2/ANO4, and CR627240/SACS) were identified in glioblastoma cell line U-87MG, and at least four (CD44/FGFR2, PVT1/PDHX, PVT1/ATE1, PVT1/PPAPDC1A) in gastric cancer cell line SNU-16.
Long-range information provided by the technology also enables haplotype phasing of exome samples. Our algorithms phased the vast majority of SNPs within genes, with phase blocks up to 1.25Mbp. We used this haplotype information to phase somatic structural variants, including large-scale deletions and amplifications.
To achieve the sensitivity required to call somatic mutations from low purity tumor specimens, it is necessary to be able to attain high coverage depth. This can be a significant challenge for low inputs; however we demonstrate duplicate-removed depth of >200x from inputs as low as 1ng, and sensitivity and precision for variant calling from 1ng libraries that is equivalent to that from ligation-based libraries prepared from 200ng of DNA.
It is becoming increasingly apparent that a complete understanding of tumor genomic profiles requires the ability to detect structural variation and phasing information from low input and high coverage sequencing libraries. With this study we demonstrate the ability of the 10X platform to enable such applications from ultra-deep targeted sequencing data using only 1ng of DNA.
Citation Format: Mirna Jarosz, Michael Schnall-Levin, Grace X. Y. Zheng, Patrick Marks, Sofia Kyriazopoulou-Panagiotopoulou, Patrice Mudivarti, Kristina Giorda. Using 1ng of DNA to detect haplotype phasing and gene fusions from whole exome sequencing of cancer cell lines. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4742. doi:10.1158/1538-7445.AM2015-4742
Collapse
|
18
|
Abstract 4882: Megabase-scale phased haplotypes of genetic aberrations from whole cancer genome sequencing of primary colorectal tumors. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-4882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Cancer genomes contain multiple types of genetic aberrations that include mutations, deletions, copy number variants and chromosomal rearrangements. Despite advances in next generation sequencing, it remains a major challenge to delineate many of these somatic genomic alterations because of intrinsic complexity of cancer genomes. Haplotyping involves the assignment of genetic variants such as mutations and structural variants to specific segment of homologous chromosomes. Experimentally determined phasing of cancer genomes offers an opportunity to resolve complex genomic structures such as somatic rearrangements, aneuploidy composition and ongoing evolutionary changes. However, contiguous phasing of cancer genomes on a megabase (Mb) scale remains difficult to achieve with sequencing-based approaches.
In this proof-of-concept study, we experimentally determined Mb-scale haplotypes of primary tumor samples via whole genome sequencing. To generate haplotypes, we employed an automated instrument that partitions long DNA fragments into hundreds of thousands of reactions, each of which incorporates a unique, nonrandom barcode into indexed sequencing libraries. Given need to amplify from sparse numbers of molecules and the high efficiency of the automated sequencing library construction process, the DNA requirements for each sample are less than 5 ng.
We sequenced the genomes of primary colorectal cancer samples and their matched normal diploid DNA with an Illumina sequencer. We used the single nucleotide variants to generate Mb-scale haplotype blocks (N50 of 1.2 Mb) with phased haplotype block size of up to 11.3 Mb. We were able to delineate cancer genome haplotypes that cover allelic imbalances, copy number variations such as deletions and other genomic instability events. Structural variants were identified in the context of their position in specific chromosome homologues. Thus, we improved the characterization of somatic genetic aberrations using contiguity mapping and cancer genome haplotypes in the context of whole cancer genome sequencing. Overall, we demonstrated the feasibility and potential utility of conducting contiguous phased haplotypes in whole cancer genome sequencing from primary tumor samples.
Citation Format: Billy Lau, John M. Bell, Michael Schnall-Levin, Mirna Jarosz, Erik Hopmans, Christina M. Wood, Grace X. Zheng, Kristina Giorda, Hanlee P. Ji. Megabase-scale phased haplotypes of genetic aberrations from whole cancer genome sequencing of primary colorectal tumors. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4882. doi:10.1158/1538-7445.AM2015-4882
Collapse
|
19
|
Examining Irf4 genomic programming of lineage development using limited populations of purified immune cells via an optimized protocol for ChIP‐seq (LB176). FASEB J 2014. [DOI: 10.1096/fasebj.28.1_supplement.lb176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
20
|
RNA‐seq to identify novel markers for neural tissue differentiation (LB211). FASEB J 2014. [DOI: 10.1096/fasebj.28.1_supplement.lb211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|