451
|
Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010; 11:685-96. [PMID: 20847746 DOI: 10.1038/nrg2841] [Citation(s) in RCA: 778] [Impact Index Per Article: 51.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
452
|
Ding L, Wendl MC, Koboldt DC, Mardis ER. Analysis of next-generation genomic data in cancer: accomplishments and challenges. Hum Mol Genet 2010; 19:R188-96. [PMID: 20843826 DOI: 10.1093/hmg/ddq391] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The application of next-generation sequencing technology has produced a transformation in cancer genomics, generating large data sets that can be analyzed in different ways to answer a multitude of questions about the genomic alterations associated with the disease. Analytical approaches can discover focused mutations such as substitutions and small insertion/deletions, large structural alterations and copy number events. As our capacity to produce such data for multiple cancers of the same type is improving, so are the demands to analyze multiple tumor genomes simultaneously growing. For example, pathway-based analyses that provide the full mutational impact on cellular protein networks and correlation analyses aimed at revealing causal relationships between genomic alterations and clinical presentations are both enabled. As the repertoire of data grows to include mRNA-seq, non-coding RNA-seq and methylation for multiple genomes, our challenge will be to intelligently integrate data types and genomes to produce a coherent picture of the genetic basis of cancer.
Collapse
Affiliation(s)
- Li Ding
- Department of Genetics, The Genome Center at Washington University School of Medicine, 4444 Forest Park Blvd., St Louis, MO 63108, USA
| | | | | | | |
Collapse
|
453
|
Magi A, Benelli M, Gozzini A, Girolami F, Torricelli F, Brandi ML. Bioinformatics for next generation sequencing data. Genes (Basel) 2010; 1:294-307. [PMID: 24710047 PMCID: PMC3954090 DOI: 10.3390/genes1020294] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Revised: 08/30/2010] [Accepted: 09/14/2010] [Indexed: 12/31/2022] Open
Abstract
The emergence of next-generation sequencing (NGS) platforms imposes increasing demands on statistical methods and bioinformatic tools for the analysis and the management of the huge amounts of data generated by these technologies. Even at the early stages of their commercial availability, a large number of softwares already exist for analyzing NGS data. These tools can be fit into many general categories including alignment of sequence reads to a reference, base-calling and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection and genome browsing. This manuscript aims to guide readers in the choice of the available computational tools that can be used to face the several steps of the data analysis workflow.
Collapse
Affiliation(s)
- Alberto Magi
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Matteo Benelli
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Alessia Gozzini
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Francesca Girolami
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Francesca Torricelli
- Diagnostic Genetic Unit, Careggi Hospital, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Maria Luisa Brandi
- Department of Internal Medicine, University of Florence Medical School, Florence, Italy.
| |
Collapse
|
454
|
Gorringe KL, George J, Anglesio MS, Ramakrishna M, Etemadmoghadam D, Cowin P, Sridhar A, Williams LH, Boyle SE, Yanaihara N, Okamoto A, Urashima M, Smyth GK, Campbell IG, Bowtell DDL. Copy number analysis identifies novel interactions between genomic loci in ovarian cancer. PLoS One 2010; 5. [PMID: 20844748 PMCID: PMC2937017 DOI: 10.1371/journal.pone.0011408] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 04/16/2010] [Indexed: 12/29/2022] Open
Abstract
Ovarian cancer is a heterogeneous disease displaying complex genomic alterations, and consequently, it has been difficult to determine the most relevant copy number alterations with the scale of studies to date. We obtained genome-wide copy number alteration (CNA) data from four different SNP array platforms, with a final data set of 398 ovarian tumours, mostly of the serous histological subtype. Frequent CNA aberrations targeted many thousands of genes. However, high-level amplicons and homozygous deletions enabled filtering of this list to the most relevant. The large data set enabled refinement of minimal regions and identification of rare amplicons such as at 1p34 and 20q11. We performed a novel co-occurrence analysis to assess cooperation and exclusivity of CNAs and analysed their relationship to patient outcome. Positive associations were identified between gains on 19 and 20q, gain of 20q and loss of X, and between several regions of loss, particularly 17q. We found weak correlations of CNA at genomic loci such as 19q12 with clinical outcome. We also assessed genomic instability measures and found a correlation of the number of higher amplitude gains with poorer overall survival. By assembling the largest collection of ovarian copy number data to date, we have been able to identify the most frequent aberrations and their interactions.
Collapse
Affiliation(s)
- Kylie L Gorringe
- Victorian Breast Cancer Research Consortium (VBCRC) Cancer Genetics Laboratory, Peter MacCallum Cancer Centre, East Melbourne, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
455
|
Now and future of mouse mutagenesis for human disease models. J Genet Genomics 2010; 37:559-72. [DOI: 10.1016/s1673-8527(09)60076-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 07/30/2010] [Accepted: 07/31/2010] [Indexed: 11/20/2022]
|
456
|
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010. [PMID: 20644199 DOI: 10.1101/gr.107524.110.20] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2023]
Abstract
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Aaron McKenna
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
457
|
Medvedev P, Fiume M, Dzamba M, Smith T, Brudno M. Detecting copy number variation with mated short reads. Genome Res 2010; 20:1613-22. [PMID: 20805290 DOI: 10.1101/gr.106344.110] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science, University of Toronto, Toronto, Ontario M5R 3G4, Canada
| | | | | | | | | |
Collapse
|
458
|
Jones SJM, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T, Chuah E, Corbett R, Fejes AP, Griffith M, Yee J, Martin M, Mayo M, Melnyk N, Morin RD, Pugh TJ, Severson T, Shah SP, Sutcliffe M, Tam A, Terry J, Thiessen N, Thomson T, Varhol R, Zeng T, Zhao Y, Moore RA, Huntsman DG, Birol I, Hirst M, Holt RA, Marra MA. Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol 2010; 11:R82. [PMID: 20696054 PMCID: PMC2945784 DOI: 10.1186/gb-2010-11-8-r82] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Revised: 07/08/2010] [Accepted: 08/09/2010] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Adenocarcinomas of the tongue are rare and represent the minority (20 to 25%) of salivary gland tumors affecting the tongue. We investigated the utility of massively parallel sequencing to characterize an adenocarcinoma of the tongue, before and after treatment. RESULTS In the pre-treatment tumor we identified 7,629 genes within regions of copy number gain. There were 1,078 genes that exhibited increased expression relative to the blood and unrelated tumors and four genes contained somatic protein-coding mutations. Our analysis suggested the tumor cells were driven by the RET oncogene. Genes whose protein products are targeted by the RET inhibitors sunitinib and sorafenib correlated with being amplified and or highly expressed. Consistent with our observations, administration of sunitinib was associated with stable disease lasting 4 months, after which the lung lesions began to grow. Administration of sorafenib and sulindac provided disease stabilization for an additional 3 months after which the cancer progressed and new lesions appeared. A recurring metastasis possessed 7,288 genes within copy number amplicons, 385 genes exhibiting increased expression relative to other tumors and 9 new somatic protein coding mutations. The observed mutations and amplifications were consistent with therapeutic resistance arising through activation of the MAPK and AKT pathways. CONCLUSIONS We conclude that complete genomic characterization of a rare tumor has the potential to aid in clinical decision making and identifying therapeutic approaches where no established treatment protocols exist. These results also provide direct in vivo genomic evidence for mutational evolution within a tumor under drug selection and potential mechanisms of drug resistance accrual.
Collapse
Affiliation(s)
- Steven JM Jones
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Janessa Laskin
- British Columbia Cancer Agency, 600 West 10th Avenue, Vancouver, BC, V5Z 4E6, Canada
| | - Yvonne Y Li
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Obi L Griffith
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Jianghong An
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Mikhail Bilenky
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Yaron S Butterfield
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Timothee Cezard
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Eric Chuah
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Richard Corbett
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Anthony P Fejes
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Malachi Griffith
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - John Yee
- Vancouver General Hospital, West 12th Avenue, Vancouver, BC, V5Z 1M9, Canada
| | - Montgomery Martin
- British Columbia Cancer Agency, 600 West 10th Avenue, Vancouver, BC, V5Z 4E6, Canada
| | - Michael Mayo
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Nataliya Melnyk
- Centre for Translational and Applied Genomics of British Columbia Cancer Agency and the Provincial Health Services Authority Laboratories, 600 West 10th Avenue, Vancouver, V5Z 4E6, BC, Canada
| | - Ryan D Morin
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Trevor J Pugh
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Tesa Severson
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Sohrab P Shah
- Centre for Translational and Applied Genomics of British Columbia Cancer Agency and the Provincial Health Services Authority Laboratories, 600 West 10th Avenue, Vancouver, V5Z 4E6, BC, Canada
- Molecular Oncology, BC Cancer Research Centre, 601 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada
| | - Margaret Sutcliffe
- British Columbia Cancer Agency, 600 West 10th Avenue, Vancouver, BC, V5Z 4E6, Canada
| | - Angela Tam
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Jefferson Terry
- Centre for Translational and Applied Genomics of British Columbia Cancer Agency and the Provincial Health Services Authority Laboratories, 600 West 10th Avenue, Vancouver, V5Z 4E6, BC, Canada
| | - Nina Thiessen
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Thomas Thomson
- British Columbia Cancer Agency, 600 West 10th Avenue, Vancouver, BC, V5Z 4E6, Canada
| | - Richard Varhol
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Thomas Zeng
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Yongjun Zhao
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Richard A Moore
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - David G Huntsman
- Vancouver General Hospital, West 12th Avenue, Vancouver, BC, V5Z 1M9, Canada
| | - Inanc Birol
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Martin Hirst
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Robert A Holt
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Marco A Marra
- Genome Sciences Centre, British Columbia Cancer Agency, 570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
459
|
Abstract
The field of epigenetics is now capitalizing on the vast number of emerging technologies, largely based on second-generation sequencing, which interrogate DNA methylation status and histone modifications genome-wide. However, getting an exhaustive and unbiased view of a methylome at a reasonable cost is proving to be a significant challenge. In this article, we take a closer look at the impact of the DNA sequence and bias effects introduced to datasets by genome-wide DNA methylation technologies and where possible, explore the bioinformatics tools that deconvolve them. There remains much to be learned about the performance of genome-wide technologies, the data we mine from these assays and how it reflects the actual biology. While there are several methods to interrogate the DNA methylation status genome-wide, our opinion is that no single technique suitably covers the minimum criteria of high coverage and, high resolution at a reasonable cost. In fact, the fraction of the methylome that is studied currently depends entirely on the inherent biases of the protocol employed. There is promise for this to change, as the third generation of sequencing technologies is expected to again 'revolutionize' the way that we study genomes and epigenomes.
Collapse
Affiliation(s)
- Mark D Robinson
- Epigenetics Laboratory, Cancer Program, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW 2010, Australia.
| | | | | | | |
Collapse
|
460
|
Sugimura H, Mori H, Nagura K, Kiyose SI, Tao H, Isozaki M, Igarashi H, Shinmura K, Hasegawa A, Kitayama Y, Tanioka F. Fluorescence in situ hybridization analysis with a tissue microarray: 'FISH and chips' analysis of pathology archives. Pathol Int 2010; 60:543-550. [PMID: 20618731 DOI: 10.1111/j.1440-1827.2010.02561.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Practicing pathologists expect major somatic genetic changes in cancers, because the morphological deviations in the cancers they diagnose are so great that the somatic genetic changes to direct these phenotypes of tumors are supposed to be correspondingly tremendous. Several lines of evidence, especially lines generated by high-throughput genomic sequencing and genome-wide analyses of cancer DNAs are verifying their preoccupations. This article reviews a comprehensive morphological approach to pathology archives that consists of fluorescence in situ hybridization with bacterial artificial chromosome (BAC) probes and screening with tissue microarrays to detect structural changes in chromosomes (copy number alterations and rearrangements) in specimens of human solid tumors. The potential of this approach in the attempt to provide individually tailored medical practice, especially in terms of cancer therapy, is discussed.
Collapse
Affiliation(s)
- Haruhiko Sugimura
- Department of Pathology, Hamamamatsu University School of Medicine, 1-20-1, Handayama, Higashi-ward, Hamamatsu 431-3192, Japan.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
461
|
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010. [PMID: 20644199 DOI: 10.1101/gr.107524.110.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Aaron McKenna
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
462
|
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010; 20:1297-303. [PMID: 20644199 DOI: 10.1101/gr.107524.110] [Citation(s) in RCA: 18687] [Impact Index Per Article: 1245.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Aaron McKenna
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
463
|
Storlazzi CT, Lonoce A, Guastadisegni MC, Trombetta D, D'Addabbo P, Daniele G, L'Abbate A, Macchia G, Surace C, Kok K, Ullmann R, Purgato S, Palumbo O, Carella M, Ambros PF, Rocchi M. Gene amplification as double minutes or homogeneously staining regions in solid tumors: origin and structure. Genome Res 2010; 20:1198-206. [PMID: 20631050 DOI: 10.1101/gr.106252.110] [Citation(s) in RCA: 178] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Double minutes (dmin) and homogeneously staining regions (hsr) are the cytogenetic hallmarks of genomic amplification in cancer. Different mechanisms have been proposed to explain their genesis. Recently, our group showed that the MYC-containing dmin in leukemia cases arise by excision and amplification (episome model). In the present paper we investigated 10 cell lines from solid tumors showing MYCN amplification as dmin or hsr. Particularly revealing results were provided by the two subclones of the neuroblastoma cell line STA-NB-10, one showing dmin-only and the second hsr-only amplification. Both subclones showed a deletion, at 2p24.3, whose extension matched the amplicon extension. Additionally, the amplicon structure of the dmin and hsr forms was identical. This strongly argues that the episome model, already demonstrated in leukemias, applies to solid tumors as well, and that dmin and hsr are two faces of the same coin. The organization of the duplicated segments varied from very simple (no apparent changes from the normal sequence) to very complex. MYCN was always overexpressed (significantly overexpressed in three cases). The fusion junctions, always mediated by nonhomologous end joining, occasionally juxtaposed truncated genes in the same transcriptional orientation. Fusion transcripts involving NBAS (also known as NAG), FAM49A, BC035112 (also known as NCRNA00276), and SMC6 genes were indeed detected, although their role in the context of the tumor is not clear.
Collapse
|
464
|
Resta N, Giorda R, Bagnulo R, Beri S, Della Mina E, Stella A, Piglionica M, Susca FC, Guanti G, Zuffardi O, Ciccone R. Breakpoint determination of 15 large deletions in Peutz-Jeghers subjects. Hum Genet 2010; 128:373-82. [PMID: 20623358 DOI: 10.1007/s00439-010-0859-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 06/30/2010] [Indexed: 12/17/2022]
Abstract
The Peutz-Jeghers Syndrome (PJS) is an autosomal dominant polyposis disorder with increased risk of multiple cancers. STK11/LKB1 (hereafter named STK11) germline mutations account for the large majority of PJS cases whereas large deletions account for about 30% of the cases. We report here the first thorough molecular characterization of 15 large deletions identified in a cohort of 51 clinically well-characterized PJS patients. The deletions were identified by MLPA analysis and characterized by custom CGH-array and quantitative PCR to define their boundaries. The deletions, ranging from 2.9 to 180 kb, removed one or more loci contiguous to the STK11 gene in six patients, while partial STK11 gene deletions were present in the remaining nine cases. By means of DNA sequencing, we were able to precisely characterize the breakpoints in each case. Of the 30 breakpoints, 16 were located in Alu elements, revealing non-allelic homologous recombination (NAHR) as the putative mechanism for the deletions of the STK11 gene, which lays in a region with high Alu density. In the remaining cases, other mechanisms could be hypothesized, such as microhomology-mediated end-joining (MMEJ) or non-homologous end-joining (NHEJ). In conclusion we here demonstrated the non-random occurrence of large deletions associated with PJS. All our patients had a classical PJS phenotype, which shows that haploinsufficiency for SBNO2, C19orf26, ATP5D, MIDN, C19orf23, CIRBP, C19orf24,and EFNA2, does not apparently affect their clinical phenotype.
Collapse
Affiliation(s)
- Nicoletta Resta
- Dipartimento di Biomedicina dell'Età Evolutiva, Sezione di Genetica Medica, Università di Bari Aldo Moro, Policlinico Piazza G. Cesare 11, 70124, Bari, Italy.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
465
|
Castellana N, Bafna V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J Proteomics 2010; 73:2124-35. [PMID: 20620248 DOI: 10.1016/j.jprot.2010.06.007] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Revised: 06/04/2010] [Accepted: 06/21/2010] [Indexed: 11/16/2022]
Abstract
Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction.
Collapse
Affiliation(s)
- Natalie Castellana
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA
| | | |
Collapse
|
466
|
The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 2010; 465:473-7. [PMID: 20505728 DOI: 10.1038/nature09004] [Citation(s) in RCA: 385] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 03/10/2010] [Indexed: 01/11/2023]
Abstract
Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.
Collapse
|
467
|
Chmielecki J, Peifer M, Jia P, Socci ND, Hutchinson K, Viale A, Zhao Z, Thomas RK, Pao W. Targeted next-generation sequencing of DNA regions proximal to a conserved GXGXXG signaling motif enables systematic discovery of tyrosine kinase fusions in cancer. Nucleic Acids Res 2010; 38:6985-96. [PMID: 20587502 PMCID: PMC2978357 DOI: 10.1093/nar/gkq579] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Tyrosine kinase (TK) fusions are attractive drug targets in cancers. However, rapid identification of these lesions has been hampered by experimental limitations. Our in silico analysis of known cancer-derived TK fusions revealed that most breakpoints occur within a defined region upstream of a conserved GXGXXG kinase motif. We therefore designed a novel DNA-based targeted sequencing approach to screen systematically for fusions within the 90 human TKs; it should detect 92% of known TK fusions. We deliberately paired ‘in-solution’ DNA capture with 454 sequencing to minimize starting material requirements, take advantage of long sequence reads, and facilitate mapping of fusions. To validate this platform, we analyzed genomic DNA from thyroid cancer cells (TPC-1) and leukemia cells (KG-1) with fusions known only at the mRNA level. We readily identified for the first time the genomic fusion sequences of CCDC6-RET in TPC-1 cells and FGFR1OP2-FGFR1 in KG-1 cells. These data demonstrate the feasibility of this approach to identify TK fusions across multiple human cancers in a high-throughput, unbiased manner. This method is distinct from other similar efforts, because it focuses specifically on targets with therapeutic potential, uses only 1.5 µg of DNA, and circumvents the need for complex computational sequence analysis.
Collapse
Affiliation(s)
- Juliann Chmielecki
- Weill Graduate School of Medical Sciences, Cornell University, New York, NY 10021, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
468
|
Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, Duncan C, Antipova A, Lee C, McKernan K, De La Vega FM, Kinzler KW, Vogelstein B, Diaz LA, Velculescu VE. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2010. [PMID: 20371490 DOI: 10.1126/scitranslmed.300070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Clinical management of human cancer is dependent on the accurate monitoring of residual and recurrent tumors. The evaluation of patient-specific translocations in leukemias and lymphomas has revolutionized diagnostics for these diseases. We have developed a method, called personalized analysis of rearranged ends (PARE), which can identify translocations in solid tumors. Analysis of four colorectal and two breast cancers with massively parallel sequencing revealed an average of nine rearranged sequences (range, 4 to 15) per tumor. Polymerase chain reaction with primers spanning the breakpoints was able to detect mutant DNA molecules present at levels lower than 0.001% and readily identified mutated circulating DNA in patient plasma samples. This approach provides an exquisitely sensitive and broadly applicable approach for the development of personalized biomarkers to enhance the clinical management of cancer patients.
Collapse
Affiliation(s)
- Rebecca J Leary
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
469
|
Potenzial und Herausforderungen der personalisierten Genomik und des 1000-Genom-Projekts. MED GENET-BERLIN 2010. [DOI: 10.1007/s11825-010-0220-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Zusammenfassung
Vor Kurzem hat die Sequenzierung individueller menschlicher Genome mittels neuartiger Technologien ein neues Kapitel in der Humangenetik eingeläutet. So hat das 1000-Genom-Projekt (1000GP) die Genomanalyse in 2500 Individuen zur Aufgabe und wird unser Wissen über genetische Variationen durch die Erstellung einer hochauflösenden Karte im Menschen maßgeblich erweitern. So sollen im 1000GP sowohl Einzelnukleotidpolymorphismen als auch strukturelle Variationen mittels Sequenzierung in mehreren ethnischen Gruppen identifiziert werden. Außerdem werden die verwendeten Technologien auf ihre Eignung für Projekte dieses Maßstabs getestet. Letztlich sollen auch neue bioinformatische Lösungen erarbeitet werden, um die 1000GP-Daten effizient für die Forschung aufarbeiten zu können. Dieser neue Katalog an häufigen und seltenen genetischen Varianten wird in naher Zukunft die Entwicklung verbesserter Methoden zur Phänotypassoziation und zur Ermittlung der molekularen Ursache verschiedener Krankheiten ermöglichen.
Collapse
|
470
|
Chen JM, Cooper DN, Férec C, Kehrer-Sawatzki H, Patrinos GP. Genomic rearrangements in inherited disease and cancer. Semin Cancer Biol 2010; 20:222-33. [PMID: 20541013 DOI: 10.1016/j.semcancer.2010.05.007] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Revised: 04/22/2010] [Accepted: 05/19/2010] [Indexed: 10/19/2022]
Abstract
Genomic rearrangements in inherited disease and cancer involve gross alterations of chromosomes or large chromosomal regions and can take the form of deletions, duplications, insertions, inversions or translocations. The characterization of a considerable number of rearrangement breakpoints has now been accomplished at the nucleotide sequence level, thereby providing an invaluable resource for the detailed study of the mutational mechanisms which underlie genomic recombination events. A better understanding of these mutational mechanisms is vital for improving the design of mutation detection strategies. At least five categories of mutational mechanism are known to give rise to genomic rearrangements: (i) homologous recombination including non-allelic homologous recombination (NAHR), gene conversion, single strand annealing (SSA) and break-induced replication (BIR), (ii) non-homologous end joining (NHEJ), (iii) microhomology-mediated replication-dependent recombination (MMRDR), (iv) long interspersed element-1 (LINE-1 or L1)-mediated retrotransposition and (v) telomere healing. Focussing on the first three of these general mechanisms, we compare and contrast their hallmark characteristics, and discuss the role of various local DNA sequence features (e.g. recombination-promoting motifs, repetitive sequences and sequences capable of non-B DNA formation) in mediating the recombination events that underlie gross genomic rearrangements. Finally, we explore how studies both at the level of the gene (using the neurofibromatosis type-1 gene as an example) and the whole genome (using data derived from cancer genome sequencing studies) are shaping our understanding of the impact of genomic rearrangements as a cause of human genetic disease.
Collapse
Affiliation(s)
- Jian-Min Chen
- Etablissement Français du Sang (EFS) - Bretagne, Brest, France.
| | | | | | | | | |
Collapse
|
471
|
Wood HM, Belvedere O, Conway C, Daly C, Chalkley R, Bickerdike M, McKinley C, Egan P, Ross L, Hayward B, Morgan J, Davidson L, MacLennan K, Ong TK, Papagiannopoulos K, Cook I, Adams DJ, Taylor GR, Rabbitts P. Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens. Nucleic Acids Res 2010; 38:e151. [PMID: 20525786 PMCID: PMC2919738 DOI: 10.1093/nar/gkq510] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.
Collapse
Affiliation(s)
- Henry M Wood
- Leeds Institute of Molecular Medicine, St James's University Hospital, Leeds, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
472
|
Koboldt DC, Ding L, Mardis ER, Wilson RK. Challenges of sequencing human genomes. Brief Bioinform 2010; 11:484-98. [PMID: 20519329 DOI: 10.1093/bib/bbq016] [Citation(s) in RCA: 101] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Massively parallel sequencing technologies continue to alter the study of human genetics. As the cost of sequencing declines, next-generation sequencing (NGS) instruments and datasets will become increasingly accessible to the wider research community. Investigators are understandably eager to harness the power of these new technologies. Sequencing human genomes on these platforms, however, presents numerous production and bioinformatics challenges. Production issues like sample contamination, library chimaeras and variable run quality have become increasingly problematic in the transition from technology development lab to production floor. Analysis of NGS data, too, remains challenging, particularly given the short-read lengths (35-250 bp) and sheer volume of data. The development of streamlined, highly automated pipelines for data analysis is critical for transition from technology adoption to accelerated research and publication. This review aims to describe the state of current NGS technologies, as well as the strategies that enable NGS users to characterize the full spectrum of DNA sequence variation in humans.
Collapse
Affiliation(s)
- Daniel C Koboldt
- The Genome Center at Washington University, St. Louis, Missouri 63108, USA.
| | | | | | | |
Collapse
|
473
|
Atanur SS, Birol İ, Guryev V, Hirst M, Hummel O, Morrissey C, Behmoaras J, Fernandez-Suarez XM, Johnson MD, McLaren WM, Patone G, Petretto E, Plessy C, Rockland KS, Rockland C, Saar K, Zhao Y, Carninci P, Flicek P, Kurtz T, Cuppen E, Pravenec M, Hubner N, Jones SJ, Birney E, Aitman TJ. The genome sequence of the spontaneously hypertensive rat: Analysis and functional significance. Genome Res 2010; 20:791-803. [PMID: 20430781 PMCID: PMC2877576 DOI: 10.1101/gr.103499.109] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Accepted: 03/10/2010] [Indexed: 11/24/2022]
Abstract
The spontaneously hypertensive rat (SHR) is the most widely studied animal model of hypertension. Scores of SHR quantitative loci (QTLs) have been mapped for hypertension and other phenotypes. We have sequenced the SHR/OlaIpcv genome at 10.7-fold coverage by paired-end sequencing on the Illumina platform. We identified 3.6 million high-quality single nucleotide polymorphisms (SNPs) between the SHR/OlaIpcv and Brown Norway (BN) reference genome, with a high rate of validation (sensitivity 96.3%-98.0% and specificity 99%-100%). We also identified 343,243 short indels between the SHR/OlaIpcv and reference genomes. These SNPs and indels resulted in 161 gain or loss of stop codons and 629 frameshifts compared with the BN reference sequence. We also identified 13,438 larger deletions that result in complete or partial absence of 107 genes in the SHR/OlaIpcv genome compared with the BN reference and 588 copy number variants (CNVs) that overlap with the gene regions of 688 genes. Genomic regions containing genes whose expression had been previously mapped as cis-regulated expression quantitative trait loci (eQTLs) were significantly enriched with SNPs, short indels, and larger deletions, suggesting that some of these variants have functional effects on gene expression. Genes that were affected by major alterations in their coding sequence were highly enriched for genes related to ion transport, transport, and plasma membrane localization, providing insights into the likely molecular and cellular basis of hypertension and other phenotypes specific to the SHR strain. This near complete catalog of genomic differences between two extensively studied rat strains provides the starting point for complete elucidation, at the molecular level, of the physiological and pathophysiological phenotypic differences between individuals from these strains.
Collapse
Affiliation(s)
- Santosh S. Atanur
- Physiological Genomics and Medicine Group, Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, United Kingdom
| | - İnanç Birol
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia V5Z 4S6, Canada
| | - Victor Guryev
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences & University Medical Centre Utrecht, Utrecht 3584 CT, The Netherlands
| | - Martin Hirst
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia V5Z 4S6, Canada
| | - Oliver Hummel
- Max-Delbrück Center for Molecular Medicine, Berlin D-13092, Germany
| | - Catherine Morrissey
- Physiological Genomics and Medicine Group, Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, United Kingdom
| | - Jacques Behmoaras
- Imperial College London, Division of Investigative Sciences, Hammersmith Hospital, London W12 0NN, United Kingdom
| | | | - Michelle D. Johnson
- Physiological Genomics and Medicine Group, Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, United Kingdom
| | - William M. McLaren
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Giannino Patone
- Max-Delbrück Center for Molecular Medicine, Berlin D-13092, Germany
| | - Enrico Petretto
- Integrative Genomics and Medicine Group, Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, United Kingdom
- Department of Epidemiology and Public Health, Faculty of Medicine, Imperial College, London W2 1PG, United Kingdom
| | - Charles Plessy
- Omics Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa 230-0045, Japan
| | - Kathleen S. Rockland
- Laboratory for Cortical Organization and Systematics, Brain Science Institute, RIKEN, Wako-shi, Saitama 351-0198, Japan
| | - Charles Rockland
- Advanced Technology Development Group, Brain Science Institute, RIKEN, Wako-shi, Saitama 351-0198, Japan
| | - Kathrin Saar
- Max-Delbrück Center for Molecular Medicine, Berlin D-13092, Germany
| | - Yongjun Zhao
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia V5Z 4S6, Canada
| | - Piero Carninci
- Omics Science Center, RIKEN Yokohama Institute, Yokohama, Kanagawa 230-0045, Japan
| | - Paul Flicek
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ted Kurtz
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, California 94107, USA
| | - Edwin Cuppen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences & University Medical Centre Utrecht, Utrecht 3584 CT, The Netherlands
| | - Michal Pravenec
- Institute of Physiology, Academy of Sciences of the Czech Republic, Prague 14220, Czech Republic
| | - Norbert Hubner
- Max-Delbrück Center for Molecular Medicine, Berlin D-13092, Germany
| | - Steven J.M. Jones
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia V5Z 4S6, Canada
| | - Ewan Birney
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Timothy J. Aitman
- Physiological Genomics and Medicine Group, Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London W12 0NN, United Kingdom
| |
Collapse
|
474
|
Kwei KA, Kung Y, Salari K, Holcomb IN, Pollack JR. Genomic instability in breast cancer: pathogenesis and clinical implications. Mol Oncol 2010; 4:255-66. [PMID: 20434415 PMCID: PMC2904860 DOI: 10.1016/j.molonc.2010.04.001] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2010] [Revised: 03/27/2010] [Accepted: 04/02/2010] [Indexed: 10/19/2022] Open
Abstract
Breast cancer is a heterogeneous disease, appreciable by molecular markers, gene-expression profiles, and most recently, patterns of genomic alteration. In particular, genomic profiling has revealed three distinct patterns of DNA copy-number alteration: a "simple" type with few gains or losses of whole chromosome arms, an "amplifier" type with focal high-level DNA amplifications, and a "complex" type marked by numerous low-amplitude changes and copy-number transitions. The three patterns are associated with distinct gene-expression subtypes, and preferentially target different loci in the genome (implicating distinct cancer genes). Moreover, the different patterns of alteration imply distinct underlying mechanisms of genomic instability. The amplifier pattern may arise from transient telomere dysfunction, although new data suggest ongoing "amplifier" instability. The complex pattern shows similarity to breast cancers with germline BRCA1 mutation, which also exhibit "basal-like" expression profiles and complex-pattern genomes, implicating a possible defect in BRCA1-associated repair of DNA double-strand breaks. As such, targeting presumptive DNA repair defects represents a promising area of clinical investigation. Future studies should clarify the pathogenesis of breast cancers with amplifier and complex-pattern genomes, and will likely identify new therapeutic opportunities.
Collapse
Affiliation(s)
- Kevin A Kwei
- Department of Pathology, Stanford University School of Medicine, CCSR-3245A, 269 Campus Drive, Stanford, CA 94305-5176, USA
| | | | | | | | | |
Collapse
|
475
|
Shiu KK, Natrajan R, Geyer FC, Ashworth A, Reis-Filho JS. DNA amplifications in breast cancer: genotypic-phenotypic correlations. Future Oncol 2010; 6:967-84. [PMID: 20528234 DOI: 10.2217/fon.10.56] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
DNA copy number changes in cancer cells, in particular, amplifications, occur frequently, have prognostic impact and are associated with subtypes of breast cancer. Some amplicons contain well-characterized oncogenes, including 11q13 (CCND1) and 17q12 (HER2). HER2 amplification and overexpression defines the HER2+ subgroup of breast cancer patients and is both a prognostic marker for poor outcome and a predictive marker for response to anti-HER2 targeted therapies. Therefore, there is considerable interest in documenting the locations of other recurring amplifications in breast cancers as they may also provide a rich source of new biomarkers and novel therapeutic targets for these subgroups. This article focuses on the genomic profiling of breast cancer, with an emphasis on the characteristics of the amplifications found in subtypes of breast cancer, including luminal (ER+)/HER2(-)), HER2+ and basal-like (ER(-)/HER2(-)), and discusses their known or potential roles in cancer biology and their clinical implications.
Collapse
Affiliation(s)
- Kai-Keen Shiu
- The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, 237 Fulham Road, London SW36JB, UK
| | | | | | | | | |
Collapse
|
476
|
De S, Babu MM. Genomic neighbourhood and the regulation of gene expression. Curr Opin Cell Biol 2010; 22:326-33. [PMID: 20493676 DOI: 10.1016/j.ceb.2010.04.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Revised: 04/12/2010] [Accepted: 04/13/2010] [Indexed: 12/31/2022]
Abstract
'Genomic neighbourhoods' or 'domains' are segments of the genome with specific characteristics associated with them (e.g. epigenetic modifications, interaction with nuclear lamina, etc.). Genomic neighbourhood influences the transcriptional activity of genes within it, and genetic manipulation or natural mutations that alter the neighbourhood of a gene have been shown to affect its expression. Several molecular mechanisms or factors (e.g. non-allelic homologous recombination, mobile elements, etc.) can alter the neighbourhood of genes. Over different time-scales such events have been suggested to contribute to speciation, introduce diversity in a population, result in germ-line and somatic mosaicism, and cause specific diseases. Understanding the role of genomic neighbourhood on gene regulation has fundamental implications for evolution, development, disease and genetic engineering.
Collapse
|
477
|
Knox AK, Dhillon T, Cheng H, Tondelli A, Pecchioni N, Stockinger EJ. CBF gene copy number variation at Frost Resistance-2 is associated with levels of freezing tolerance in temperate-climate cereals. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 121:21-35. [PMID: 20213518 DOI: 10.1007/s00122-010-1288-7] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2008] [Accepted: 02/01/2010] [Indexed: 05/18/2023]
Abstract
Frost Resistance-1 (FR-1) and FR-2 are two loci affecting freezing tolerance and winter hardiness of the temperate-climate cereals. FR-1 is hypothesized to be due to the pleiotropic effects of VRN-1. FR-2 spans a cluster of C-Repeat Binding Factor (CBF) genes. These loci are genetically and functionally linked. Recent studies indicate CBF transcripts are downregulated by the VRN-1 encoded MADS-box protein or a factor in the VRN-1 pathway. Here, we report that barley genotypes 'Dicktoo' and 'Nure' carrying a vrn-H1 winter allele at VRN-H1 harbor increased copy numbers of CBF coding sequences relative to Vrn-H1 spring allele genotypes 'Morex' and 'Tremois'. Sequencing bacteriophage lambda genomic clones from these four genotypes alongside DNA blot hybridizations indicate approximately half of the eleven CBF orthologs at FR-H2 are duplicated in individual genomes. One of these duplications discriminates vrn-H1 genotypes from Vrn-H1 genotypes. The vrn-H1 winter allele genotypes harbor tandem segmental duplications through the CBF2A-CBF4B genomic region and maintain two distinct CBF2 paralogs, while the Vrn-H1 spring allele genotypes harbor single copies of CBF2 and CBF4. An additional CBF gene, CBF13, is a pseudogene interrupted by multiple non-sense codons in 'Tremois' whereas CBF13 is a complete uninterrupted coding sequence in 'Dicktoo' and 'Nure'. DNA blot hybridization with wheat DNAs reveals greater copy numbers of CBF14 also occurs in winter wheats than in spring wheats. These data indicate that variation in CBF gene copy numbers is widespread in the Triticeae and suggest selection for winter hardiness co-selects winter alleles at both VRN-1 and FR-2.
Collapse
Affiliation(s)
- Andrea K Knox
- Department of Horticulture and Crop Science, The Ohio State University/Ohio Agricultural Research and Development Center, 1680 Madison Ave, Wooster, OH 44691, USA
| | | | | | | | | | | |
Collapse
|
478
|
Anderson MW, Schrijver I. Next generation DNA sequencing and the future of genomic medicine. Genes (Basel) 2010; 1:38-69. [PMID: 24710010 PMCID: PMC3960862 DOI: 10.3390/genes1010038] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2010] [Revised: 05/20/2010] [Accepted: 05/21/2010] [Indexed: 12/20/2022] Open
Abstract
In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpretation, laboratory workflow, data storage, and ethical considerations. This review describes the current high-throughput sequencing platforms commercially available, and compares the inherent advantages and disadvantages of each. The potential applications for clinical diagnostics are considered, as well as the need for software and analysis tools to interpret the vast amount of data generated. Finally, we discuss the clinical and ethical implications of the wealth of genetic information generated by these methods. Despite the challenges, we anticipate that the evolution and refinement of high-throughput DNA sequencing technologies will catalyze a new era of personalized medicine based on individualized genomic analysis.
Collapse
Affiliation(s)
- Matthew W Anderson
- Department of Pathology, Stanford University Medical Center, 300 Pasteur Drive, Room L235, Stanford, CA 94305-5627, USA.
| | - Iris Schrijver
- Department of Pathology, Stanford University Medical Center, 300 Pasteur Drive, Room L235, Stanford, CA 94305-5627, USA.
| |
Collapse
|
479
|
Abstract
Recent data show that cells from many cancers exhibit massive chromosome instability. The traditional view is that the gradual accumulation of mutations in genes involved in transcriptional regulation and cell cycle controls results in tumor development. This, however, does not exclude the possibility that some mutations could be more potent than others in destabilizing the genome by targeting both chromosomal integrity and corresponding checkpoint mechanisms simultaneously. Three such examples of "single-hit" lesions potentially leading to heritable genome destabilization are discussed. They include: failure to release sister chromatid cohesion due to the incomplete proteolytic cleavage of cohesin; massive merotelic kinetochore misattachments upon condensin depletion; and chromosome under-replication. In all three cases, cells fail to detect potential chromosomal bridges before anaphase entry, indicating that there is a basic cell cycle requirement to maintain a degree of sister chromatid bridging that is not recognizable as chromosomal damage.
Collapse
Affiliation(s)
- Alexander V Strunnikov
- Laboratory of Immunopathology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5640 Fishers Lane, Room 1524, Rockville, MD 20852, USA.
| |
Collapse
|
480
|
Bueno R, De Rienzo A, Dong L, Gordon GJ, Hercus CF, Richards WG, Jensen RV, Anwar A, Maulik G, Chirieac LR, Ho KF, Taillon BE, Turcotte CL, Hercus RG, Gullans SR, Sugarbaker DJ. Second generation sequencing of the mesothelioma tumor genome. PLoS One 2010; 5:e10612. [PMID: 20485525 PMCID: PMC2869344 DOI: 10.1371/journal.pone.0010612] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 04/01/2010] [Indexed: 12/29/2022] Open
Abstract
The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.
Collapse
Affiliation(s)
- Raphael Bueno
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Assunta De Rienzo
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Lingsheng Dong
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Gavin J. Gordon
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | - William G. Richards
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Roderick V. Jensen
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | | | - Gautam Maulik
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Lucian R. Chirieac
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | | | - Bruce E. Taillon
- 454 Life Sciences, Inc., Branford, Connecticut, United States of America
| | | | | | - Steven R. Gullans
- Excel Medical Ventures, Boston, Massachusetts, United States of America
| | - David J. Sugarbaker
- The International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
- Division of Thoracic Surgery, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
481
|
Abstract
The genetics of complex diseases has been given a tremendous boost in recent years by the introduction of high-throughput laboratory methods that make it possible to approach larger questions in larger populations and to cover the genome more comprehensively. The ability to determine genotypes of many individuals accurately and efficiently has allowed genetic studies that cover more of the variation within individual genes, instead of focusing only on one or a few coding variants, and to do so in study samples of reasonable power. Chip-based genotyping assays, combined with knowledge of the patterns of coinheritance of markers (linkage disequilibrium [LD]), have stimulated genome-wide association studies (GWAS) of complex diseases. Recent successes of GWAS in identifying specific genes that affect risk for common diseases are dramatic illustrations of how improved technology can lead to scientific breakthroughs. A key issue in high-throughput genotyping is to choose the appropriate technology for your goals and for the stage of your experiment, being cognizant of your sample numbers and resources. This article introduces some of the commonly used methods of high-throughput single-nucleotide polymorphism (SNP) genotyping for different stages of genetic studies and briefly reviews some of the high-throughput sequencing methods just coming into use. It also mentions some recent developments in "next-generation" sequencing that will enable other kinds of studies. This article is not intended to be comprehensive, and because technology in this area is rapidly changing, our comments should be taken as a starting point for further investigation.
Collapse
|
482
|
Conrad DF, Bird C, Blackburne B, Lindsay S, Mamanova L, Lee C, Turner DJ, Hurles ME. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat Genet 2010; 42:385-91. [PMID: 20364136 PMCID: PMC3428939 DOI: 10.1038/ng.564] [Citation(s) in RCA: 190] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2009] [Accepted: 03/09/2010] [Indexed: 12/30/2022]
Abstract
Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 10% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.
Collapse
|
483
|
DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing. BMC Genomics 2010; 11:244. [PMID: 20398377 PMCID: PMC2867831 DOI: 10.1186/1471-2164-11-244] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 04/16/2010] [Indexed: 11/14/2022] Open
Abstract
Background DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. Results We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. Conclusion The described assay outputs absolute copy number, outputs an error estimate (p-value), and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.
Collapse
|
484
|
Lord CJ, Ashworth A. Biology-driven cancer drug development: back to the future. BMC Biol 2010; 8:38. [PMID: 20385032 PMCID: PMC2864096 DOI: 10.1186/1741-7007-8-38] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Accepted: 04/12/2010] [Indexed: 01/01/2023] Open
Abstract
Most of the significant recent advances in cancer treatment have been based on the great strides that have been made in our understanding of the underlying biology of the disease. Nevertheless, the exploitation of biological insight in the oncology clinic has been haphazard and we believe that this needs to be enhanced and optimized if patients are to receive maximum benefit. Here, we discuss how research has driven cancer drug development in the past and describe how recent advances in biology, technology, our conceptual understanding of cell networks and removal of some roadblocks may facilitate therapeutic advances in the (hopefully) near future.
Collapse
Affiliation(s)
- Christopher J Lord
- The Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research, Fulham Road, London, SW3 6JB, UK
| | - Alan Ashworth
- The Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research, Fulham Road, London, SW3 6JB, UK
| |
Collapse
|
485
|
Abstract
Much of our understanding of how organisms develop and function is derived from the extraordinarily powerful, classic approach of screening for mutant organisms in which a specific biological process is disrupted. Reaping the fruits of such forward genetic screens in metazoan model systems like Drosophila, Caenorhabditis elegans, or zebrafish traditionally involves time-consuming positional cloning strategies that result in the identification of the mutant locus. Whole genome sequencing (WGS) has begun to provide an effective alternative to this approach through direct pinpointing of the molecular lesion in a mutated strain isolated from a genetic screen. Apart from significantly altering the pace and costs of genetic analysis, WGS also provides new perspectives on solving genetic problems that are difficult to tackle with conventional approaches, such as identifying the molecular basis of multigenic and complex traits.
Collapse
|
486
|
Laborde RR, Novakova V, Olsen KD, Kasperbauer JL, Moore EJ, Smith DI. Expression profiles of viral responsive genes in oral and oropharyngeal cancers. Eur J Cancer 2010; 46:1153-8. [DOI: 10.1016/j.ejca.2010.01.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Revised: 01/07/2010] [Accepted: 01/20/2010] [Indexed: 01/08/2023]
|
487
|
Ruan Y, Wei C. Multiplex parallel pair‐end‐ditag sequencing approaches in system biology. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2010; 2:224-234. [DOI: 10.1002/wsbm.40] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Yijun Ruan
- Genome Technology & Biology Group, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672
| | - Chia‐Lin Wei
- Genome Technology & Biology Group, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672
| |
Collapse
|
488
|
Newman S, Edwards PA. High-throughput analysis of chromosome translocations and other genome rearrangements in epithelial cancers. Genome Med 2010; 2:19. [PMID: 20236477 PMCID: PMC2873797 DOI: 10.1186/gm140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Genes that are broken or fused by structural changes to the genome are an important class of mutation in the leukemias and sarcomas but have been largely overlooked in the common epithelial cancers. Large-scale sequencing is changing our perceptions of the cancer genome, and it is now being applied to structural changes, using the 'paired end' strategy. This reveals more clearly than before the extent to which many cancer genomes are rearranged and how much these rearrangements contribute to the mutational burden of epithelial tumors. In particular, there are probably many fusion genes, analogous to those found in leukemias, to be found in common cancers, such as breast carcinoma, and some of these will prove to be important in cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Scott Newman
- Hutchison-MRC Research Centre and Department of Pathology, University of Cambridge, Hills Road, Cambridge, CB2 0XZ, UK.
| | | |
Collapse
|
489
|
Bashir A, Lu Q, Carson D, Raphael BJ, Liu YT, Bafna V. Optimizing PCR assays for DNA-based cancer diagnostics. J Comput Biol 2010; 17:369-81. [PMID: 20377451 PMCID: PMC3213025 DOI: 10.1089/cmb.2009.0203] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Somatically acquired DNA rearrangements are characteristic of many cancers. The use of these mutations as diagnostic markers is challenging, because tumor cells are frequently admixed with normal cells, particularly in early stage tumor samples, and thus the samples contain a high background of normal DNA. Detection is further confounded by the fact that the rearrangement boundaries are not conserved across individuals, and might vary over hundreds of kilobases. Here, we present an algorithm for designing polymerase chain reaction (PCR) primers and oligonucleotide probes to assay for these variant rearrangements. Specifically, the primers and probes tile the entire genomic region surrounding a rearrangement, so as to amplify the mutant DNA over a wide range of possible breakpoints and robustly assay for the amplified signal on an array. Our solution involves the design of a complex combinatorial optimization problem, and also includes a novel alternating multiplexing strategy that makes efficient detection possible. Simulations show that we can achieve near-optimal detection in many different cases, even when the regions are highly non-symmetric. Additionally, we prove that the suggested multiplexing strategy is optimal in breakpoint detection. We applied our technique to create a custom design to assay for genomic lesions in several cancer cell-lines associated with a disruption in the CDKN2A locus. The CDKN2A deletion has highly variable boundaries across many cancers. We successfully detect the breakpoint in all cell-lines, even when the region has undergone multiple rearrangements. These results point to the development of a successful protocol for early diagnosis and monitoring of cancer. For online Supplementary Material, see www.liebertonline.com.
Collapse
Affiliation(s)
- Ali Bashir
- Department of Computer Science, University of California, San Diego, La Jolla, California
| | - Qing Lu
- Department of Computer Science, University of California, San Diego, La Jolla, California
- Moores Cancer Center, University of California, La Jolla, California
| | - Dennis Carson
- Department of Computer Science, University of California, San Diego, La Jolla, California
- Moores Cancer Center, University of California, La Jolla, California
| | - Benjamin J. Raphael
- Department of Computer Science, Brown University, Providence, Rhode Island
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| | - Yu-Tsueng Liu
- Department of Computer Science, University of California, San Diego, La Jolla, California
- Moores Cancer Center, University of California, La Jolla, California
| | - Vineet Bafna
- Department of Computer Science, University of California, San Diego, La Jolla, California
| |
Collapse
|
490
|
McCabe MT, Powell DR, Zhou W, Vertino PM. Homozygous deletion of the STK11/LKB1 locus and the generation of novel fusion transcripts in cervical cancer cells. CANCER GENETICS AND CYTOGENETICS 2010; 197:130-41. [PMID: 20193846 PMCID: PMC2837085 DOI: 10.1016/j.cancergencyto.2009.11.017] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/10/2009] [Revised: 11/14/2009] [Accepted: 11/25/2009] [Indexed: 01/20/2023]
Abstract
The STK11/LKB1 gene encodes a ubiquitously expressed serine/threonine kinase that is mutated in multiple sporadic cancers including non-small cell lung carcinomas, pancreatic cancers, and melanomas. LKB1 plays a role in multiple cellular functions including cell growth, cell cycle progression, metabolism, cell polarity, and migration. To date, only a limited number of studies have assessed the status of LKB1 in cervical cancers. Herein, we investigate DNA methylation, DNA mutation, and transcription at the LKB1 locus in cervical cancer cell lines. We identified homozygous deletions of 25-85kb in the HeLa and SiHa cell lines. Deletion breakpoint analysis in HeLa cells revealed that the deletion resulted from an Alu-recombination-mediated deletion (ARMD) and generated a novel LKB1 fusion transcript driven by an uncharacterized CpG island promoter located approximately 11kb upstream of LKB1. Although the homozygous deletion in SiHa cells removes the entire LKB1 gene and portions of the neighboring genes SBNO2 and c19orf26, this deletion also generates a fusion transcript driven by the c19orf26 promoter and composed of both c19orf26 and SBNO2 sequences. Further analyses of public gene expression and mutation databases suggest that LKB1 and its neighboring genes are frequently dysregulated in primary cervical cancers. Thus, homozygous deletions affecting LKB1 in cervical cancers may generate multiple fusion transcripts involving LKB1, SBNO2, and c19orf26.
Collapse
Affiliation(s)
- Michael T. McCabe
- Department of Radiation Oncology, Emory University School of Medicine, Atlanta, GA 30322
| | - Doris R. Powell
- Department of Radiation Oncology, Emory University School of Medicine, Atlanta, GA 30322
| | - Wei Zhou
- Department of Hematology and Medical Oncology, Emory University School of Medicine, Atlanta, GA 30322
- Winship Cancer Institute, Emory University School of Medicine, Atlanta, GA 30322
| | - Paula M. Vertino
- Department of Radiation Oncology, Emory University School of Medicine, Atlanta, GA 30322
- Winship Cancer Institute, Emory University School of Medicine, Atlanta, GA 30322
| |
Collapse
|
491
|
Moynahan ME, Jasin M. Mitotic homologous recombination maintains genomic stability and suppresses tumorigenesis. Nat Rev Mol Cell Biol 2010; 11:196-207. [PMID: 20177395 PMCID: PMC3261768 DOI: 10.1038/nrm2851] [Citation(s) in RCA: 703] [Impact Index Per Article: 46.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Mitotic homologous recombination promotes genome stability through the precise repair of DNA double-strand breaks and other lesions that are encountered during normal cellular metabolism and from exogenous insults. As a result, homologous recombination repair is essential during proliferative stages in development and during somatic cell renewal in adults to protect against cell death and mutagenic outcomes from DNA damage. Mutations in mammalian genes encoding homologous recombination proteins, including BRCA1, BRCA2 and PALB2, are associated with developmental abnormalities and tumorigenesis. Recent advances have provided a clearer understanding of the connections between these proteins and of the key steps of homologous recombination and DNA strand exchange.
Collapse
|
492
|
Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, Duncan C, Antipova A, Lee C, McKernan K, De La Vega FM, Kinzler KW, Vogelstein B, Diaz LA, Velculescu VE. Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2010; 2:20ra14. [PMID: 20371490 PMCID: PMC2858564 DOI: 10.1126/scitranslmed.3000702] [Citation(s) in RCA: 391] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Clinical management of human cancer is dependent on the accurate monitoring of residual and recurrent tumors. The evaluation of patient-specific translocations in leukemias and lymphomas has revolutionized diagnostics for these diseases. We have developed a method, called personalized analysis of rearranged ends (PARE), which can identify translocations in solid tumors. Analysis of four colorectal and two breast cancers with massively parallel sequencing revealed an average of nine rearranged sequences (range, 4 to 15) per tumor. Polymerase chain reaction with primers spanning the breakpoints was able to detect mutant DNA molecules present at levels lower than 0.001% and readily identified mutated circulating DNA in patient plasma samples. This approach provides an exquisitely sensitive and broadly applicable approach for the development of personalized biomarkers to enhance the clinical management of cancer patients.
Collapse
Affiliation(s)
- Rebecca J. Leary
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Isaac Kinde
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Frank Diehl
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Kerstin Schmidt
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | | | | | | | | | | | | | - Kenneth W. Kinzler
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Bert Vogelstein
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Luis A. Diaz
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Victor E. Velculescu
- Ludwig Center for Cancer Genetics and Therapeutics and Howard Hughes Medical Institute, Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| |
Collapse
|
493
|
Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA. Integrative analysis of the melanoma transcriptome. Genome Res 2010; 20:413-27. [PMID: 20179022 DOI: 10.1101/gr.103697.109] [Citation(s) in RCA: 214] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Global studies of transcript structure and abundance in cancer cells enable the systematic discovery of aberrations that contribute to carcinogenesis, including gene fusions, alternative splice isoforms, and somatic mutations. We developed a systematic approach to characterize the spectrum of cancer-associated mRNA alterations through integration of transcriptomic and structural genomic data, and we applied this approach to generate new insights into melanoma biology. Using paired-end massively parallel sequencing of cDNA (RNA-seq) together with analyses of high-resolution chromosomal copy number data, we identified 11 novel melanoma gene fusions produced by underlying genomic rearrangements, as well as 12 novel readthrough transcripts. We mapped these chimeric transcripts to base-pair resolution and traced them to their genomic origins using matched chromosomal copy number information. We also used these data to discover and validate base-pair mutations that accumulated in these melanomas, revealing a surprisingly high rate of somatic mutation and lending support to the notion that point mutations constitute the major driver of melanoma progression. Taken together, these results may indicate new avenues for target discovery in melanoma, while also providing a template for large-scale transcriptome studies across many tumor types.
Collapse
Affiliation(s)
- Michael F Berger
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
494
|
Aparicio SAJR, Huntsman DG. Does massively parallel DNA resequencing signify the end of histopathology as we know it? J Pathol 2010; 220:307-15. [PMID: 19921711 DOI: 10.1002/path.2636] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Next-generation DNA sequencing devices have revolutionized cancer genomics by bringing whole genome resequencing of patients' tumours within practical and economic reach. We present an overview of the techniques involved and review early results from the resequencing of cancer genomes. The possible impacts of whole-genome and trancriptome resequencing in clinical cancer research and the practice of pathology are discussed.
Collapse
|
495
|
Edwards PAW. Fusion genes and chromosome translocations in the common epithelial cancers. J Pathol 2010; 220:244-54. [PMID: 19921709 DOI: 10.1002/path.2632] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
It has been known for 25 years that fusion genes play a central role in leukaemias and sarcomas but they have been neglected in the common carcinomas, largely because of technical limitations of cytogenetics. In the last few years it has emerged that gene fusions, caused by chromosome translocations, inversions, deletions, etc., are important in the common epithelial cancers, such as prostate and lung carcinoma. Most prostate cancers, for example, have an androgen-regulated fusion of one of the ETS transcription factor gene family. Early results of genome-wide searches for gene fusions in breast and other epithelial cancers suggest that most individual tumours will have several fused genes. Fusion genes are exceptionally powerful mutations. In their simplest form they can turn on expression by promoter insertion but they can also, for example, force dimerization of a protein or change its subcellular location. They are correspondingly important clinically, in classification and management and as targets for therapy. This review surveys what we know of fusion genes in the carcinomas, summarizes the technical advances that now make it possible to search systematically for such genes, and concludes by putting fusion genes into the current picture of mutation in cancers.
Collapse
Affiliation(s)
- Paul A W Edwards
- Department of Pathology and Hutchison/MRC Research Centre, University of Cambridge, Cambridge CB2 0XZ, UK.
| |
Collapse
|
496
|
Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations. PLoS One 2010; 5:e9317. [PMID: 20174472 PMCID: PMC2824832 DOI: 10.1371/journal.pone.0009317] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Accepted: 01/29/2010] [Indexed: 12/18/2022] Open
Abstract
Due to growing throughput and shrinking cost, massively parallel sequencing is rapidly becoming an attractive alternative to microarrays for the genome-wide study of gene expression and copy number alterations in primary tumors. The sequencing of transcripts (RNA-Seq) should offer several advantages over microarray-based methods, including the ability to detect somatic mutations and accurately measure allele-specific expression. To investigate these advantages we have applied a novel, strand-specific RNA-Seq method to tumors and matched normal tissue from three patients with oral squamous cell carcinomas. Additionally, to better understand the genomic determinants of the gene expression changes observed, we have sequenced the tumor and normal genomes of one of these patients. We demonstrate here that our RNA-Seq method accurately measures allelic imbalance and that measurement on the genome-wide scale yields novel insights into cancer etiology. As expected, the set of genes differentially expressed in the tumors is enriched for cell adhesion and differentiation functions, but, unexpectedly, the set of allelically imbalanced genes is also enriched for these same cancer-related functions. By comparing the transcriptomic perturbations observed in one patient to his underlying normal and tumor genomes, we find that allelic imbalance in the tumor is associated with copy number mutations and that copy number mutations are, in turn, strongly associated with changes in transcript abundance. These results support a model in which allele-specific deletions and duplications drive allele-specific changes in gene expression in the developing tumor.
Collapse
|
497
|
Castellana NE, Pham V, Arnott D, Lill JR, Bafna V. Template proteogenomics: sequencing whole proteins using an imperfect database. Mol Cell Proteomics 2010; 9:1260-70. [PMID: 20164058 DOI: 10.1074/mcp.m900504-mcp200] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Database search algorithms are the primary workhorses for the identification of tandem mass spectra. However, these methods are limited to the identification of spectra for which peptides are present in the database, preventing the identification of peptides from mutated or alternatively spliced sequences. A variety of methods has been developed to search a spectrum against a sequence allowing for variations. Some tools determine the sequence of the homologous protein in the related species but do not report the peptide in the target organism. Other tools consider variations, including modifications and mutations, in reconstructing the target sequence. However, these tools will not work if the template (homologous peptide) is missing in the database, and they do not attempt to reconstruct the entire protein target sequence. De novo identification of peptide sequences is another possibility, because it does not require a protein database. However, the lack of database reduces the accuracy. We present a novel proteogenomic approach, GenoMS, that draws on the strengths of database and de novo peptide identification methods. Protein sequence templates (i.e. proteins or genomic sequences that are similar to the target protein) are identified using the database search tool InsPecT. The templates are then used to recruit, align, and de novo sequence regions of the target protein that have diverged from the database or are missing. We used GenoMS to reconstruct the full sequence of an antibody by using spectra acquired from multiple digests using different proteases. Antibodies are a prime example of proteins that confound standard database identification techniques. The mature antibody genes result from large-scale genome rearrangements with flexible fusion boundaries and somatic hypermutation. Using GenoMS we automatically reconstruct the complete sequences of two immunoglobulin chains with accuracy greater than 98% using a diverged protein database. Using the genome as the template, we achieve accuracy exceeding 97%.
Collapse
Affiliation(s)
- Natalie E Castellana
- Department of Computer Science, University of California, San Diego, San Diego, California 92093, USA
| | | | | | | | | |
Collapse
|
498
|
Mir KU. Sequencing genomes: from individuals to populations. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2010; 8:367-78. [PMID: 19808932 DOI: 10.1093/bfgp/elp040] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The whole genome sequences of Jim Watson and Craig Venter are early examples of personalized genomics, which promises to change how we approach healthcare in the future. Before personal sequencing can have practical medical benefits, however, and before it should be advocated for implementation at the population-scale, there needs to be a better understanding of which genetic variants influence which traits and how their effects are modified by epigenetic factors. Nonetheless, for forging links between DNA sequence and phenotype, efforts to sequence the genomes of individuals need to continue; this includes sequencing sub-populations for association studies which analyse the difference in sequence between disease affected and unaffected individuals. Such studies can only be applied on a large enough scale to be effective if the massive strides in sequencing technology that have recently occurred also continue.
Collapse
Affiliation(s)
- Kalim U Mir
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
499
|
Lefort N, Perrier AL, Laâbi Y, Varela C, Peschanski M. Human embryonic stem cells and genomic instability. Regen Med 2010; 4:899-909. [PMID: 19903007 DOI: 10.2217/rme.09.63] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Owing to their original properties, pluripotent human embryonic stem cells (hESCs) and their progenies are highly valuable not only for regenerative medicine, but also as tools to study development and pathologies or as cellular substrates to screen and test new drugs. However, ensuring their genomic integrity is one important prerequisite for both research and therapeutic applications. Until recently, several studies about the genomic stability of cultured hESCs had described chromosomal or else large genomic alterations detectable with conventional karyotypic methods. In the past year, several laboratories have reported many small genomic alterations, in the megabase-sized range, using more sensitive karyotyping methods, showing that hESCs are prone to acquire focal genomic abnormalities in culture. As these alterations were found to be nonrandom, these findings strongly advocate for high-resolution monitoring of human pluripotent stem cell lines, especially when intended to be used for clinical applications.
Collapse
Affiliation(s)
- Nathalie Lefort
- Institute for Stem cell Therapy and Exploration of Monogenic diseases, Desbruères, 91030 Evry cedex, France.
| | | | | | | | | |
Collapse
|
500
|
Zecchin D, Bardelli A. Tracking the genomic evolution of breast cancer metastasis. Breast Cancer Res 2010; 12:302. [PMID: 20156320 PMCID: PMC2880424 DOI: 10.1186/bcr2469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Therapeutic choices for metastatic tumors are, in most cases, based upon the histological and molecular analysis of the corresponding primary tumor. Understanding whether and to what extent the genomic landscape of metastasis differs from the tumors from which they originated is critical yet largely unknown. A recent report tackled this key issue by comparing the genomic and transcriptional profile of a metastatic lobular breast tumor with that of the primary tumor surgically removed 9 years earlier. The extent of the differences suggests a high degree of mutational heterogeneity between primary and metastatic lesions and indicates that significant evolution occurs during breast cancer progression.
Collapse
Affiliation(s)
- Davide Zecchin
- Laboratory of Molecular Genetics, Institute for Cancer Research and Treatment, University of Turin Medical School, SP 142, km 3,95, I-10060 Candiolo, Turin, Italy
| | | |
Collapse
|