251
|
Zhang Q, Zeng X, Younkin S, Kawli T, Snyder MP, Keleş S. Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection. BMC Bioinformatics 2016; 17:96. [PMID: 26908256 PMCID: PMC4765064 DOI: 10.1186/s12859-016-0957-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 02/19/2016] [Indexed: 01/22/2023] Open
Abstract
Background Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36–50 bps), long (75–100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. Results We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Conclusions Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0957-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qi Zhang
- Department of Statistics, University of Nebraska Lincoln, Lincoln, Nebraska, USA.
| | - Xin Zeng
- Department of Statistics, University of Wisconsin Madison, Madison, Wisconsin, USA
| | - Sam Younkin
- Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, Madison, Wisconsin, USA
| | - Trupti Kawli
- Department of Genetics, Stanford University School of Medicine, Palo Alto, California, USA.
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, California, USA. .,Stanford Center for Genomics and Personalized Medicine, Palo Alto, California, USA.
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin Madison, Madison, Wisconsin, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin Madison, Madison, Wisconsin, USA.
| |
Collapse
|
252
|
Boeva V. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells. Front Genet 2016; 7:24. [PMID: 26941778 PMCID: PMC4763482 DOI: 10.3389/fgene.2016.00024] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 02/05/2016] [Indexed: 12/27/2022] Open
Abstract
Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation.
Collapse
Affiliation(s)
- Valentina Boeva
- Centre de Recherche, Institut CurieParis, France; INSERM, U900Paris, France; Mines ParisTechFontainebleau, France; PSL Research UniversityParis, France; Department of Development, Reproduction and Cancer, Institut CochinParis, France; INSERM, U1016Paris, France; Centre National de la Recherche Scientifique UMR 8104Paris, France; Université Paris Descartes UMR-S1016Paris, France
| |
Collapse
|
253
|
Development of a Comprehensive Sequencing Assay for Inherited Cardiac Condition Genes. J Cardiovasc Transl Res 2016; 9:3-11. [PMID: 26888179 PMCID: PMC4767849 DOI: 10.1007/s12265-016-9673-5] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 01/07/2016] [Indexed: 12/15/2022]
Abstract
Inherited cardiac conditions (ICCs) are characterised by marked genetic and allelic heterogeneity and require extensive sequencing for genetic characterisation. We iteratively optimised a targeted gene capture panel for ICCs that includes disease-causing, putatively pathogenic, research and phenocopy genes (n = 174 genes). We achieved high coverage of the target region on both MiSeq (>99.8 % at ≥20× read depth, n = 12) and NextSeq (>99.9 % at ≥20×, n = 48) platforms with 100 % sensitivity and precision for single nucleotide variants and indels across the protein-coding target on the MiSeq. In the final assay, 40 out of 43 established ICC genes informative in clinical practice achieved complete coverage (100 % at ≥20×). By comparison, whole exome sequencing (WES; ∼80×), deep WES (∼500×) and whole genome sequencing (WGS; ∼70×) had poorer performance (88.1, 99.2 and 99.3 % respectively at ≥20×) across the ICC target. The assay described here delivers highly accurate and affordable sequencing of ICC genes, complemented by accessible cloud-based computation and informatics. See Editorial in this issue (DOI: 10.1007/s12265-015-9667-8).
Collapse
|
254
|
Martin MD, Vieira FG, Ho SYW, Wales N, Schubert M, Seguin-Orlando A, Ristaino JB, Gilbert MTP. Genomic Characterization of a South American Phytophthora Hybrid Mandates Reassessment of the Geographic Origins of Phytophthora infestans. Mol Biol Evol 2016; 33:478-91. [PMID: 26576850 PMCID: PMC4866541 DOI: 10.1093/molbev/msv241] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
As the oomycete pathogen causing potato late blight disease, Phytophthora infestans triggered the famous 19th-century Irish potato famine and remains the leading cause of global commercial potato crop destruction. But the geographic origin of the genotype that caused this devastating initial outbreak remains disputed, as does the New World center of origin of the species itself. Both Mexico and South America have been proposed, generating considerable controversy. Here, we readdress the pathogen's origins using a genomic data set encompassing 71 globally sourced modern and historical samples of P. infestans and the hybrid species P. andina, a close relative known only from the Andean highlands. Previous studies have suggested that the nuclear DNA lineage behind the initial outbreaks in Europe in 1845 is now extinct. Analysis of P. andina's phased haplotypes recovered eight haploid genome sequences, four of which represent a previously unknown basal lineage of P. infestans closely related to the famine-era lineage. Our analyses further reveal that clonal lineages of both P. andina and historical P. infestans diverged earlier than modern Mexican lineages, casting doubt on recent claims of a Mexican center of origin. Finally, we use haplotype phasing to demonstrate that basal branches of the clade comprising Mexican samples are occupied by clonal isolates collected from wild Solanum hosts, suggesting that modern Mexican P. infestans diversified on Solanum tuberosum after a host jump from a wild species and that the origins of P. infestans are more complex than was previously thought.
Collapse
Affiliation(s)
- Michael D Martin
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark Department of Integrative Biology, Center for Theoretical Evolutionary Genomics, University of California, Berkeley
| | - Filipe G Vieira
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, NSW, Australia
| | - Nathan Wales
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Mikkel Schubert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Andaine Seguin-Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Jean B Ristaino
- Department of Plant Pathology, North Carolina State University
| | - M Thomas P Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark Trace and Environmental DNA Laboratory, Department of Environment and Agriculture, Curtin University, Perth, WA, Australia
| |
Collapse
|
255
|
Smagulova F, Brick K, Pu Y, Camerini-Otero RD, Petukhova GV. The evolutionary turnover of recombination hot spots contributes to speciation in mice. Genes Dev 2016; 30:266-80. [PMID: 26833728 PMCID: PMC4743057 DOI: 10.1101/gad.270009.115] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 12/15/2015] [Indexed: 01/12/2023]
Abstract
Meiotic recombination is required for the segregation of homologous chromosomes and is essential for fertility. In most mammals, the DNA double-strand breaks (DSBs) that initiate meiotic recombination are directed to a subset of genomic loci (hot spots) by sequence-specific binding of the PRDM9 protein. Rapid evolution of the DNA-binding specificity of PRDM9 and gradual erosion of PRDM9-binding sites by gene conversion will alter the recombination landscape over time. To better understand the evolutionary turnover of recombination hot spots and its consequences, we mapped DSB hot spots in four major subspecies of Mus musculus with different Prdm9 alleles and in their F1 hybrids. We found that hot spot erosion governs the preferential usage of some Prdm9 alleles over others in hybrid mice and increases sequence diversity specifically at hot spots that become active in the hybrids. As crossovers are disfavored at such hot spots, we propose that sequence divergence generated by hot spot turnover may create an impediment for recombination in hybrids, potentially leading to reduced fertility and, eventually, speciation.
Collapse
Affiliation(s)
- Fatima Smagulova
- Department of Biochemistry and Molecular Biology, Uniformed Services University of Health Sciences, Bethesda, Maryland 20814, USA
| | - Kevin Brick
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20814, USA
| | - Yongmei Pu
- Department of Biochemistry and Molecular Biology, Uniformed Services University of Health Sciences, Bethesda, Maryland 20814, USA
| | - R Daniel Camerini-Otero
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20814, USA
| | - Galina V Petukhova
- Department of Biochemistry and Molecular Biology, Uniformed Services University of Health Sciences, Bethesda, Maryland 20814, USA
| |
Collapse
|
256
|
Meienberg J, Bruggmann R, Oexle K, Matyas G. Clinical sequencing: is WGS the better WES? Hum Genet 2016; 135:359-62. [PMID: 26742503 PMCID: PMC4757617 DOI: 10.1007/s00439-015-1631-9] [Citation(s) in RCA: 215] [Impact Index Per Article: 26.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 12/25/2015] [Indexed: 11/26/2022]
Abstract
Current clinical next-generation sequencing is done by using gene panels and exome analysis, both of which involve selective capturing of target regions. However, capturing has limitations in sufficiently covering coding exons, especially GC-rich regions. We compared whole exome sequencing (WES) with the most recent PCR-free whole genome sequencing (WGS), showing that only the latter is able to provide hitherto unprecedented complete coverage of the coding region of the genome. Thus, from a clinical/technical point of view, WGS is the better WES so that capturing is no longer necessary for the most comprehensive genomic testing of Mendelian disorders.
Collapse
Affiliation(s)
- Janine Meienberg
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, 8952, Schlieren-Zurich, Switzerland
| | - Rémy Bruggmann
- Interfaculty Bioinformatics Unit and Swiss Institute of Bioinformatics, University of Berne, 3012, Berne, Switzerland
| | - Konrad Oexle
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, 8952, Schlieren-Zurich, Switzerland
| | - Gabor Matyas
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, 8952, Schlieren-Zurich, Switzerland.
- Zurich Center for Integrative Human Physiology, University of Zurich, 8057, Zurich, Switzerland.
| |
Collapse
|
257
|
Ewing AD. Transposable element detection from whole genome sequence data. Mob DNA 2015; 6:24. [PMID: 26719777 PMCID: PMC4696183 DOI: 10.1186/s13100-015-0055-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 12/21/2015] [Indexed: 11/25/2022] Open
Abstract
The number of software tools available for detecting transposable element insertions from whole genome sequence data has been increasing steadily throughout the last ~5 years. Some of these methods have unique features suiting them for particular use cases, but in general they follow one or more of a common set of approaches. Here, detection and filtering approaches are reviewed in the light of transposable element biology and the current state of whole genome sequencing. We demonstrate that the current state-of-the-art methods still do not produce highly concordant results and provide resources to assist future development in transposable element detection methods.
Collapse
Affiliation(s)
- Adam D Ewing
- Mater Research Institute - University of Queensland, 37 Kent St Level 4, Woolloongabba, QLD 4102 Australia
| |
Collapse
|
258
|
Li Y, Lu Y, Polak U, Lin K, Shen J, Farmer J, Seyer L, Bhalla AD, Rozwadowska N, Lynch DR, Butler JS, Napierala M. Expanded GAA repeats impede transcription elongation through the FXN gene and induce transcriptional silencing that is restricted to the FXN locus. Hum Mol Genet 2015; 24:6932-43. [PMID: 26401053 PMCID: PMC4654050 DOI: 10.1093/hmg/ddv397] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 09/21/2015] [Indexed: 11/13/2022] Open
Abstract
Friedreich's ataxia (FRDA) is a severe neurodegenerative disease caused by homozygous expansion of the guanine-adenine-adenine (GAA) repeats in intron 1 of the FXN gene leading to transcriptional repression of frataxin expression. Post-translational histone modifications that typify heterochromatin are enriched in the vicinity of the repeats, whereas active chromatin marks in this region are underrepresented in FRDA samples. Yet, the immediate effect of the expanded repeats on transcription progression through FXN and their long-range effect on the surrounding genomic context are two critical questions that remain unanswered in the molecular pathogenesis of FRDA. To address these questions, we conducted next-generation RNA sequencing of a large cohort of FRDA and control primary fibroblasts. This comprehensive analysis revealed that the GAA-induced silencing effect does not influence expression of neighboring genes upstream or downstream of FXN. Furthermore, no long-range silencing effects were detected across a large portion of chromosome 9. Additionally, results of chromatin immunoprecipitation studies confirmed that histone modifications associated with repressed transcription are confined to the FXN locus. Finally, deep sequencing of FXN pre-mRNA molecules revealed a pronounced defect in the transcription elongation rate in FRDA cells when compared with controls. These results indicate that approaches aimed to reactivate frataxin expression should simultaneously address deficits in transcription initiation and elongation at the FXN locus.
Collapse
Affiliation(s)
- Yanjie Li
- Department of Biochemistry and Molecular Genetics, UAB Stem Cell Institute, University of Alabama at Birmingham, 1825 University Blvd., Birmingham, AL 35294, USA
| | - Yue Lu
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Science Park, Smithville, TX 78957, USA
| | - Urszula Polak
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Science Park, Smithville, TX 78957, USA, Department of Cell Biology, Poznan University of Medical Sciences, Rokietnicka 5D, Poznan 60-806, Poland
| | - Kevin Lin
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Science Park, Smithville, TX 78957, USA
| | - Jianjun Shen
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Science Park, Smithville, TX 78957, USA
| | - Jennifer Farmer
- Division of Neurology and Pediatrics, Children's Hospital of Philadelphia, Abramson Research Center Room 502, Philadelphia, PA 19104, USA
| | - Lauren Seyer
- Division of Neurology and Pediatrics, Children's Hospital of Philadelphia, Abramson Research Center Room 502, Philadelphia, PA 19104, USA
| | - Angela D Bhalla
- Department of Biochemistry and Molecular Genetics, UAB Stem Cell Institute, University of Alabama at Birmingham, 1825 University Blvd., Birmingham, AL 35294, USA
| | - Natalia Rozwadowska
- Department of Biochemistry and Molecular Genetics, UAB Stem Cell Institute, University of Alabama at Birmingham, 1825 University Blvd., Birmingham, AL 35294, USA, Institute of Human Genetics, Polish Academy of Science, Strzeszynska 32, Poznan 60-479, Poland
| | - David R Lynch
- Division of Neurology and Pediatrics, Children's Hospital of Philadelphia, Abramson Research Center Room 502, Philadelphia, PA 19104, USA
| | - Jill Sergesketter Butler
- Department of Biochemistry and Molecular Genetics, UAB Stem Cell Institute, University of Alabama at Birmingham, 1825 University Blvd., Birmingham, AL 35294, USA,
| | - Marek Napierala
- Department of Biochemistry and Molecular Genetics, UAB Stem Cell Institute, University of Alabama at Birmingham, 1825 University Blvd., Birmingham, AL 35294, USA, Department of Molecular Biomedicine, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland and
| |
Collapse
|
259
|
Roberts NJ, Norris AL, Petersen GM, Bondy ML, Brand R, Gallinger S, Kurtz RC, Olson SH, Rustgi AK, Schwartz AG, Stoffel E, Syngal S, Zogopoulos G, Ali SZ, Axilbund J, Chaffee KG, Chen YC, Cote ML, Childs EJ, Douville C, Goes FS, Herman JM, Iacobuzio-Donahue C, Kramer M, Makohon-Moore A, McCombie RW, McMahon KW, Niknafs N, Parla J, Pirooznia M, Potash JB, Rhim AD, Smith AL, Wang Y, Wolfgang CL, Wood LD, Zandi PP, Goggins M, Karchin R, Eshleman JR, Papadopoulos N, Kinzler KW, Vogelstein B, Hruban RH, Klein AP. Whole Genome Sequencing Defines the Genetic Heterogeneity of Familial Pancreatic Cancer. Cancer Discov 2015; 6:166-75. [PMID: 26658419 DOI: 10.1158/2159-8290.cd-15-0402] [Citation(s) in RCA: 259] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 12/02/2015] [Indexed: 12/14/2022]
Abstract
UNLABELLED Pancreatic cancer is projected to become the second leading cause of cancer-related death in the United States by 2020. A familial aggregation of pancreatic cancer has been established, but the cause of this aggregation in most families is unknown. To determine the genetic basis of susceptibility in these families, we sequenced the germline genomes of 638 patients with familial pancreatic cancer and the tumor exomes of 39 familial pancreatic adenocarcinomas. Our analyses support the role of previously identified familial pancreatic cancer susceptibility genes such as BRCA2, CDKN2A, and ATM, and identify novel candidate genes harboring rare, deleterious germline variants for further characterization. We also show how somatic point mutations that occur during hematopoiesis can affect the interpretation of genome-wide studies of hereditary traits. Our observations have important implications for the etiology of pancreatic cancer and for the identification of susceptibility genes in other common cancer types. SIGNIFICANCE The genetic basis of disease susceptibility in the majority of patients with familial pancreatic cancer is unknown. We whole genome sequenced 638 patients with familial pancreatic cancer and demonstrate that the genetic underpinning of inherited pancreatic cancer is highly heterogeneous. This has significant implications for the management of patients with familial pancreatic cancer.
Collapse
Affiliation(s)
- Nicholas J Roberts
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Ludwig Center and the Howard Hughes Medical Institute, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland.
| | - Alexis L Norris
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Gloria M Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Melissa L Bondy
- Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas
| | - Randall Brand
- Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Steven Gallinger
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Robert C Kurtz
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Sara H Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Anil K Rustgi
- Division of Gastroenterology, Departments of Medicine and Genetics, Pancreatic Cancer Translational Center of Excellence, Abramson Cancer Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania
| | - Ann G Schwartz
- Karmanos Cancer Institute, Wayne State University School of Medicine, Detroit, Michigan
| | - Elena Stoffel
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan
| | - Sapna Syngal
- Population Sciences Division, Dana-Farber Cancer Institute, and Gastroenterology Division, Brigham and Women's Hospital, Boston, Massachusetts
| | - George Zogopoulos
- The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada. Goodman Cancer Research Centre, McGill University, Montreal, Quebec, Canada
| | - Syed Z Ali
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Jennifer Axilbund
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Kari G Chaffee
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Yun-Ching Chen
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Michele L Cote
- Karmanos Cancer Institute, Wayne State University School of Medicine, Detroit, Michigan
| | - Erica J Childs
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland
| | - Christopher Douville
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Fernando S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Joseph M Herman
- Department of Oncology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | | | - Melissa Kramer
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | - Alvin Makohon-Moore
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Richard W McCombie
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | - K Wyatt McMahon
- Ludwig Center and the Howard Hughes Medical Institute, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Noushin Niknafs
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Jennifer Parla
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. inGenious Targeting Laboratory, Ronkonkoma, New York
| | - Mehdi Pirooznia
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - James B Potash
- Department of Psychiatry, University of Iowa, Iowa City, Iowa
| | - Andrew D Rhim
- Division of Gastroenterology, Departments of Medicine and Genetics, Pancreatic Cancer Translational Center of Excellence, Abramson Cancer Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania. Department of Medicine, University of Michigan, Ann Arbor, Michigan
| | - Alyssa L Smith
- The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada. Goodman Cancer Research Centre, McGill University, Montreal, Quebec, Canada
| | - Yuxuan Wang
- Ludwig Center and the Howard Hughes Medical Institute, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Christopher L Wolfgang
- Department of Surgery, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Laura D Wood
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Department of Oncology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Peter P Zandi
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Michael Goggins
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Department of Oncology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Department of Medicine, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Rachel Karchin
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - James R Eshleman
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Department of Oncology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Nickolas Papadopoulos
- Ludwig Center and the Howard Hughes Medical Institute, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland
| | - Kenneth W Kinzler
- Ludwig Center and the Howard Hughes Medical Institute, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland.
| | - Bert Vogelstein
- Ludwig Center and the Howard Hughes Medical Institute, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland.
| | - Ralph H Hruban
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Department of Oncology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland.
| | - Alison P Klein
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland. Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland. Department of Oncology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins Medical Institutions, Baltimore, Maryland.
| |
Collapse
|
260
|
Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, Heisler LE, Beck TA, Simpson JT, Tonon L, Sertier AS, Patch AM, Jäger N, Ginsbach P, Drews R, Paramasivam N, Kabbe R, Chotewutmontri S, Diessl N, Previti C, Schmidt S, Brors B, Feuerbach L, Heinold M, Gröbner S, Korshunov A, Tarpey PS, Butler AP, Hinton J, Jones D, Menzies A, Raine K, Shepherd R, Stebbings L, Teague JW, Ribeca P, Giner FC, Beltran S, Raineri E, Dabad M, Heath SC, Gut M, Denroche RE, Harding NJ, Yamaguchi TN, Fujimoto A, Nakagawa H, Quesada V, Valdés-Mas R, Nakken S, Vodák D, Bower L, Lynch AG, Anderson CL, Waddell N, Pearson JV, Grimmond SM, Peto M, Spellman P, He M, Kandoth C, Lee S, Zhang J, Létourneau L, Ma S, Seth S, Torrents D, Xi L, Wheeler DA, López-Otín C, Campo E, Campbell PJ, Boutros PC, Puente XS, Gerhard DS, Pfister SM, McPherson JD, Hudson TJ, Schlesner M, Lichter P, Eils R, Jones DTW, Gut IG. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 2015; 6:10001. [PMID: 26647970 PMCID: PMC4682041 DOI: 10.1038/ncomms10001] [Citation(s) in RCA: 207] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 12/13/2022] Open
Abstract
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.
Collapse
Affiliation(s)
- Tyler S. Alioto
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Ivo Buchhalter
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Sophia Derdak
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Barbara Hutter
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Matthew D. Eldridge
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Lawrence E. Heisler
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Timothy A. Beck
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Jared T. Simpson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Laurie Tonon
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
| | - Anne-Sophie Sertier
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
| | - Ann-Marie Patch
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - Natalie Jäger
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Department of Genetics, Stanford University, Mail Stop-5120, Stanford, California 94305-5120, USA
| | - Philip Ginsbach
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Ruben Drews
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Nagarajan Paramasivam
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Rolf Kabbe
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Sasithorn Chotewutmontri
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Nicolle Diessl
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Christopher Previti
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Sabine Schmidt
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Lars Feuerbach
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Michael Heinold
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Susanne Gröbner
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 430, Heidelberg 69120, Germany
| | - Andrey Korshunov
- Department of Neuropathology, Heidelberg University Hospital, Im Neuenheimer Feld 224, Heidelberg 69120, Germany
| | | | - Adam P. Butler
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jonathan Hinton
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - David Jones
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Andrew Menzies
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Keiran Raine
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Rebecca Shepherd
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Lucy Stebbings
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jon W. Teague
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Paolo Ribeca
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Francesc Castro Giner
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Emanuele Raineri
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Marc Dabad
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Simon C. Heath
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Robert E. Denroche
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Nicholas J. Harding
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Takafumi N. Yamaguchi
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Akihiro Fujimoto
- RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Hidewaki Nakagawa
- RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Víctor Quesada
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Rafael Valdés-Mas
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Daniel Vodák
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- The Bioinformatics Core Facility, Institute for Cancer Genetics and Informatics, Oslo University Hospital, 0310 Oslo, Norway
| | - Lawrence Bower
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Andrew G. Lynch
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Charlotte L. Anderson
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Melbourne, Victoria 3053, Australia
| | - Nicola Waddell
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - John V. Pearson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - Sean M. Grimmond
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- WolfsonWohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland G61 1QH, UK
| | - Myron Peto
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon 97239-3098, USA
| | - Paul Spellman
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon 97239-3098, USA
| | | | - Cyriac Kandoth
- The Genome Institute, Washington University, St Louis, Missouri 63108, USA
| | - Semin Lee
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Zhang
- Harvard Medical School, Boston, Massachusetts 02115, USA
- MD Anderson Cancer Center, Houston, Texas 77030, USA
| | | | - Singer Ma
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Sahil Seth
- MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - David Torrents
- IRB-BSC Joint Research Program on Computational Biology, Barcelona Supercomputing Center, 08034 Barcelona, Spain
| | - Liu Xi
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - David A. Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Carlos López-Otín
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Elías Campo
- Hematopathology Unit, Department of Pathology, Hospital Clinic, University of Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer, 08036 Barcelona, Spain
| | | | - Paul C. Boutros
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
| | - Xose S. Puente
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Daniela S. Gerhard
- National Cancer Institute, Office of Cancer Genomics, 31 Center Drive, 10A07, Bethesda, Maryland 20892-2580, USA
| | - Stefan M. Pfister
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 430, Heidelberg 69120, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - John D. McPherson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
| | - Thomas J. Hudson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Matthias Schlesner
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Peter Lichter
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120,Germany
- Heidelberg Center for Personalised Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Roland Eils
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Heidelberg Center for Personalised Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Pharmacy and Molecular Biotechnology, University of Heidelberg, Heidelberg 69120, Germany
- Bioquant Center, University of Heidelberg, Im Neuenheimer Feld 267, Heidelberg 69120, Germany
| | - David T. W. Jones
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Ivo G. Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
261
|
Begum F, Ruczinski I, Li S, Silverman EK, Cho MH, Lynch DA, Curran-Everett D, Crapo J, Scharpf RB, Parker MM, Hetmanski JB, Beaty TH. Identifying a Deletion Affecting Total Lung Capacity Among Subjects in the COPDGene Study Cohort. Genet Epidemiol 2015; 40:81-8. [PMID: 26643968 DOI: 10.1002/gepi.21943] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Revised: 09/21/2015] [Accepted: 10/19/2015] [Indexed: 01/17/2023]
Abstract
Chronic obstructive pulmonary disease (COPD) is a progressive disease with both environmental and genetic risk factors. Genome-wide association studies (GWAS) have identified multiple genomic regions influencing risk of COPD. To thoroughly investigate the genetic etiology of COPD, however, it is also important to explore the role of copy number variants (CNVs) because the presence of structural variants can alter gene expression and can be causal for some diseases. Here, we investigated effects of polymorphic CNVs on quantitative measures of pulmonary function and chest computed tomography (CT) phenotypes among subjects enrolled in COPDGene, a multisite study. COPDGene subjects consist of roughly one-third African American (AA) and two-thirds non-Hispanic white adult smokers (with or without COPD). We estimated CNVs using PennCNV on 9,076 COPDGene subjects using Illumina's Omni-Express genome-wide marker array. We tested for association between polymorphic CNV components (defined as disjoint intervals of copy number regions) for several quantitative phenotypes associated with COPD within each racial group. Among the AAs, we identified a polymorphic CNV on chromosome 5q35.2 located between two genes (FAM153B and SIMK1, but also harboring several pseudo-genes) giving genome-wide significance in tests of association with total lung capacity (TLCCT ) as measured by chest CT scans. This is the first study of genome-wide association tests of polymorphic CNVs and TLCCT . Although the ARIC cohort did not have the phenotype of TLCCT , we found similar counts of CNV deletions and amplifications among AA and European subjects in this second cohort.
Collapse
Affiliation(s)
- Ferdouse Begum
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Shengchao Li
- Cancer Genomics Research Laboratory (CGR), Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, Maryland, United States of America
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - David A Lynch
- Department of Medicine, National Jewish Health, Denver, Colorado, United States of America
| | - Douglas Curran-Everett
- Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, Colorado, United States of America
| | - James Crapo
- Department of Medicine, National Jewish Health, Denver, Colorado, United States of America
| | - Robert B Scharpf
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Margaret M Parker
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Jacqueline B Hetmanski
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| |
Collapse
|
262
|
Liu Y, Liu J, Lu J, Peng J, Juan L, Zhu X, Li B, Wang Y. Joint detection of copy number variations in parent-offspring trios. Bioinformatics 2015; 32:1130-7. [PMID: 26644415 DOI: 10.1093/bioinformatics/btv707] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 11/27/2015] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy. RESULTS In this study, we developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, we modeled the read depth signal while considering both GC content bias and mappability bias. Moreover, we incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, we showed that TrioCNV achieved superior performance than existing approaches. AVAILABILITY AND IMPLEMENTATION The software TrioCNV implemented using a combination of Java and R is freely available from the website at https://github.com/yongzhuang/TrioCNV CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jianguo Lu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Liran Juan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaolin Zhu
- Institute for Genomic Medicine, Columbia University, New York, NY 10032, University Program in Genetics and Genomics, Duke University Medical School, Durham, NC 27708
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN 37235 and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37235, USA
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
263
|
Schrider DR, Kern AD. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain. Genome Biol Evol 2015; 7:3511-28. [PMID: 26590212 PMCID: PMC4700959 DOI: 10.1093/gbe/evv228] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods.
Collapse
Affiliation(s)
| | - Andrew D Kern
- Department of Genetics, Rutgers University, Piscataway Human Genetics Institute of New Jersey, Piscataway, New Jersey
| |
Collapse
|
264
|
Limbach M, Saare M, Tserel L, Kisand K, Eglit T, Sauer S, Axelsson T, Syvänen AC, Metspalu A, Milani L, Peterson P. Epigenetic profiling in CD4+ and CD8+ T cells from Graves' disease patients reveals changes in genes associated with T cell receptor signaling. J Autoimmun 2015; 67:46-56. [PMID: 26459776 DOI: 10.1016/j.jaut.2015.09.006] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 09/24/2015] [Accepted: 09/29/2015] [Indexed: 11/25/2022]
Abstract
In Graves' disease (GD), a combination of genetic, epigenetic and environmental factors causes an autoimmune response to the thyroid gland, characterized by lymphocytic infiltrations and autoantibodies targeting the thyroid stimulating hormone receptor (TSHR) and other thyroid antigens. To identify the epigenetic changes involved in GD, we performed a genome-wide analysis of DNA methylation and enrichment of H3K4me3 and H3K27ac histone marks in sorted CD4+ and CD8+ T cells. We found 365 and 3322 differentially methylated CpG sites in CD4+ and CD8+ T cells, respectively. Among the hypermethylated CpG sites, we specifically found enrichment of genes involved in T cell signaling (CD247, LCK, ZAP70, CD3D, CD3E, CD3G, CTLA4 and CD8A) and decreased expression of CD3 gene family members. The hypermethylation was accompanied with decreased levels of H3K4me3 and H3K27ac marks at several T cell signaling genes in ChIP-seq analysis. In addition, we found hypermethylation of the TSHR gene first intron, where several GD-associated polymorphisms are located. Our results demonstrate an involvement of dysregulated DNA methylation and histone modifications at T cell signaling genes in GD patients.
Collapse
Affiliation(s)
- Maia Limbach
- Molecular Pathology, Institute of Biomedical and Translational Medicine, Tartu, Estonia
| | - Mario Saare
- Molecular Pathology, Institute of Biomedical and Translational Medicine, Tartu, Estonia
| | - Liina Tserel
- Molecular Pathology, Institute of Biomedical and Translational Medicine, Tartu, Estonia
| | - Kai Kisand
- Molecular Pathology, Institute of Biomedical and Translational Medicine, Tartu, Estonia
| | - Triin Eglit
- Department of Internal Medicine, University of Tartu, Tartu, Estonia; Internal Medicine Clinic, Tartu University Hospital, Tartu, Estonia
| | - Sascha Sauer
- Max-Planck Institute for Molecular Genetics, Berlin, Germany
| | - Tomas Axelsson
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ann-Christine Syvänen
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Andres Metspalu
- Estonian Genome Center, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Lili Milani
- Estonian Genome Center, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia.
| | - Pärt Peterson
- Molecular Pathology, Institute of Biomedical and Translational Medicine, Tartu, Estonia.
| |
Collapse
|
265
|
Arloth J, Bader DM, Röh S, Altmann A. Re-Annotator: Annotation Pipeline for Microarray Probe Sequences. PLoS One 2015; 10:e0139516. [PMID: 26426330 PMCID: PMC4591122 DOI: 10.1371/journal.pone.0139516] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 09/13/2015] [Indexed: 11/19/2022] Open
Abstract
Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis. An accurate mapping of the array probes is essential to generate reliable biological findings. However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases. Here, we present the Re-Annotator, a re-annotation pipeline for microarray probe sequences. It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays. The Re-Annotator uses a custom-built mRNA reference database to identify the positions of gene expression array probe sequences. We applied Re-Annotator to the Illumina Human-HT12 v4 microarray platform and found that about one quarter (25%) of the probes differed from the manufacturer's annotation. In further computational experiments on experimental gene expression data, we compared Re-Annotator to another probe re-annotation tool, ReMOAT, and found that Re-Annotator provided an improved re-annotation of microarray probes. A thorough re-annotation of probe information is crucial to any microarray analysis. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.
Collapse
Affiliation(s)
- Janine Arloth
- Translational Research Department, Max Planck Institute of Psychiatry, Kraepelinstrasse 2–10, 80804, Munich, Germany
- * E-mail:
| | - Daniel M. Bader
- Translational Research Department, Max Planck Institute of Psychiatry, Kraepelinstrasse 2–10, 80804, Munich, Germany
- Gene Center Munich, Ludwig-Maximillians-Universität München, Feodor-Lynen Strasse 25, 81377, Munich, Germany
| | - Simone Röh
- Translational Research Department, Max Planck Institute of Psychiatry, Kraepelinstrasse 2–10, 80804, Munich, Germany
| | - Andre Altmann
- Department of Neurology and Neurological Sciences, Stanford University, School of Medicine, 780 Welch Road, CJ350 C38, CA-94304 Palo Alto, California, United States of America
| |
Collapse
|
266
|
Huber CD, DeGiorgio M, Hellmann I, Nielsen R. Detecting recent selective sweeps while controlling for mutation rate and background selection. Mol Ecol 2015; 25:142-56. [PMID: 26290347 PMCID: PMC5082542 DOI: 10.1111/mec.13351] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 07/31/2015] [Accepted: 08/17/2015] [Indexed: 12/19/2022]
Abstract
A composite likelihood ratio test implemented in the program sweepfinder is a commonly used method for scanning a genome for recent selective sweeps. sweepfinder uses information on the spatial pattern (along the chromosome) of the site frequency spectrum around the selected locus. To avoid confounding effects of background selection and variation in the mutation process along the genome, the method is typically applied only to sites that are variable within species. However, the power to detect and localize selective sweeps can be greatly improved if invariable sites are also included in the analysis. In the spirit of a Hudson–Kreitman–Aguadé test, we suggest adding fixed differences relative to an out‐group to account for variation in mutation rate, thereby facilitating more robust and powerful analyses. We also develop a method for including background selection, modelled as a local reduction in the effective population size. Using simulations, we show that these advances lead to a gain in power while maintaining robustness to mutation rate variation. Furthermore, the new method also provides more precise localization of the causative mutation than methods using the spatial pattern of segregating sites alone.
Collapse
Affiliation(s)
- Christian D Huber
- Max F. Perutz Laboratory, University of Vienna, Vienna, Austria.,Vienna Graduate School of Population Genetics, University of Veterinary Medicine, Vienna, Austria.,Department of Ecology and Evolutionary Biology, University of California, Los Angeles, 621 Charles E. Young Drive South, Los Angeles, CA, 90095-1606, USA
| | - Michael DeGiorgio
- Departments of Biology and Statistics, Pennsylvania State University, University Park, PA, USA.,Institute for CyberScience, Pennsylvania State University, University Park, PA, USA
| | - Ines Hellmann
- Department Biologie II, Ludwig-Maximilians-Universität München, Großhaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Rasmus Nielsen
- Departments of Integrative Biology and Statistics, University of California, Berkeley, CA, USA
| |
Collapse
|
267
|
Lochovsky L, Zhang J, Fu Y, Khurana E, Gerstein M. LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations. Nucleic Acids Res 2015; 43:8123-34. [PMID: 26304545 PMCID: PMC4787796 DOI: 10.1093/nar/gkv803] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 07/28/2015] [Indexed: 01/22/2023] Open
Abstract
In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotation. Also, great mutation heterogeneity and potential correlations between neighboring sites give rise to substantial overdispersion in mutation count, resulting in problematic background rate estimation. Here, we address these issues with a new computational framework called LARVA. It integrates variants with a comprehensive set of noncoding functional elements, modeling the mutation counts of the elements with a β-binomial distribution to handle overdispersion. LARVA, moreover, uses regional genomic features such as replication timing to better estimate local mutation rates and mutational hotspots. We demonstrate LARVA's effectiveness on 760 whole-genome tumor sequences, showing that it identifies well-known noncoding drivers, such as mutations in the TERT promoter. Furthermore, LARVA highlights several novel highly mutated regulatory sites that could potentially be noncoding drivers. We make LARVA available as a software tool and release our highly mutated annotations as an online resource (larva.gersteinlab.org).
Collapse
Affiliation(s)
- Lucas Lochovsky
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Jing Zhang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Ekta Khurana
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10065, USA Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Department of Computer Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
268
|
Absence of canonical marks of active chromatin in developmentally regulated genes. Nat Genet 2015; 47:1158-1167. [PMID: 26280901 PMCID: PMC4625605 DOI: 10.1038/ng.3381] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 07/22/2015] [Indexed: 12/13/2022]
Abstract
The interplay of active and repressive histone modifications is assumed to have a key role in the regulation of gene expression. In contrast to this generally accepted view, we show that the transcription of genes temporally regulated during fly and worm development occurs in the absence of canonically active histone modifications. Conversely, strong chromatin marking is related to transcriptional and post-transcriptional stability, an association that we also observe in mammals. Our results support a model in which chromatin marking is associated with the stable production of RNA, whereas unmarked chromatin would permit rapid gene activation and deactivation during development. In the latter case, regulation by transcription factors would have a comparatively more important regulatory role than chromatin marks.
Collapse
|
269
|
Bao R, Hernandez K, Huang L, Kang W, Bartom E, Onel K, Volchenboum S, Andrade J. ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification. PLoS One 2015; 10:e0135800. [PMID: 26271043 PMCID: PMC4535852 DOI: 10.1371/journal.pone.0135800] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 07/27/2015] [Indexed: 12/30/2022] Open
Abstract
Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.
Collapse
Affiliation(s)
- Riyue Bao
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Kyle Hernandez
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Lei Huang
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Wenjun Kang
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Elizabeth Bartom
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
| | - Kenan Onel
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
| | - Samuel Volchenboum
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Pediatrics, The University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (JA); (SV)
| | - Jorge Andrade
- Center for Research Informatics, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (JA); (SV)
| |
Collapse
|
270
|
Killcoyne S, Del Sol A. Identification of large-scale genomic variation in cancer genomes using in silico reference models. Nucleic Acids Res 2015; 44:e5. [PMID: 26264669 PMCID: PMC4705683 DOI: 10.1093/nar/gkv828] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 08/01/2015] [Indexed: 12/21/2022] Open
Abstract
Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed.
Collapse
Affiliation(s)
- Sarah Killcoyne
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue Swing, Belvaux L-4367, Luxembourg
| | - Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 6, Avenue Swing, Belvaux L-4367, Luxembourg
| |
Collapse
|
271
|
RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations. Genetics 2015; 198:59-73. [PMID: 25236449 PMCID: PMC4174954 DOI: 10.1534/genetics.114.165886] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.
Collapse
|
272
|
Miga KH, Eisenhart C, Kent WJ. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Res 2015; 43:e133. [PMID: 26163063 PMCID: PMC4787761 DOI: 10.1093/nar/gkv671] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 06/18/2015] [Indexed: 11/14/2022] Open
Abstract
The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher Eisenhart
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
273
|
Kaspi A, Ziemann M, Keating ST, Khurana I, Connor T, Spolding B, Cooper A, Lazarus R, Walder K, Zimmet P, El-Osta A. Non-referenced genome assembly from epigenomic short-read data. Epigenetics 2015; 9:1329-38. [PMID: 25437048 DOI: 10.4161/15592294.2014.969610] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.
Collapse
Affiliation(s)
- Antony Kaspi
- a Epigenetics in Human Health and Disease Laboratory ; Baker IDI Heart and Diabetes Institute ; The Alfred Medical Research and Education Precinct ; Melbourne , Victoria , Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
274
|
Marco-Sola S, Ribeca P. Efficient Alignment of Illumina-Like High-Throughput Sequencing Reads with the GEnomic Multi-tool (GEM) Mapper. ACTA ACUST UNITED AC 2015; 50:11.13.1-11.13.20. [PMID: 26094690 DOI: 10.1002/0471250953.bi1113s50] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Modern Illumina-like high-throughput sequencing machines allow the cheap decoding of great amounts of DNA. The GEnomic Multi-tool (GEM) mapper is one of the fastest and most sensitive methods known to date to align such data to a known genomic reference. This unit explains how to use it effectively.
Collapse
Affiliation(s)
| | - Paolo Ribeca
- Centro Nacional de Análisis Genómico, Barcelona, Spain.,The Pirbright Institute, Woking, United Kingdom
| |
Collapse
|
275
|
Wang C, Lv Y, Wang B, Yin C, Lin Y, Pan L. Survey of protein-DNA interactions in Aspergillus oryzae on a genomic scale. Nucleic Acids Res 2015; 43:4429-46. [PMID: 25883143 PMCID: PMC4482085 DOI: 10.1093/nar/gkv334] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 03/31/2015] [Indexed: 01/23/2023] Open
Abstract
The genome-scale delineation of in vivo protein–DNA interactions is key to understanding genome function. Only ∼5% of transcription factors (TFs) in the Aspergillus genus have been identified using traditional methods. Although the Aspergillus oryzae genome contains >600 TFs, knowledge of the in vivo genome-wide TF-binding sites (TFBSs) in aspergilli remains limited because of the lack of high-quality antibodies. We investigated the landscape of in vivo protein–DNA interactions across the A. oryzae genome through coupling the DNase I digestion of intact nuclei with massively parallel sequencing and the analysis of cleavage patterns in protein–DNA interactions at single-nucleotide resolution. The resulting map identified overrepresented de novo TF-binding motifs from genomic footprints, and provided the detailed chromatin remodeling patterns and the distribution of digital footprints near transcription start sites. The TFBSs of 19 known Aspergillus TFs were also identified based on DNase I digestion data surrounding potential binding sites in conjunction with TF binding specificity information. We observed that the cleavage patterns of TFBSs were dependent on the orientation of TF motifs and independent of strand orientation, consistent with the DNA shape features of binding motifs with flanking sequences.
Collapse
Affiliation(s)
- Chao Wang
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Yangyong Lv
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Bin Wang
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Chao Yin
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Ying Lin
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| | - Li Pan
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, 510006, China
| |
Collapse
|
276
|
Wood DLA, Nones K, Steptoe A, Christ A, Harliwong I, Newell F, Bruxner TJC, Miller D, Cloonan N, Grimmond SM. Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data. PLoS One 2015; 10:e0126911. [PMID: 25965996 PMCID: PMC4428808 DOI: 10.1371/journal.pone.0126911] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/09/2015] [Indexed: 11/19/2022] Open
Abstract
Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual’s phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci.
Collapse
Affiliation(s)
- David L. A. Wood
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
- * E-mail:
| | - Katia Nones
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - Anita Steptoe
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - Angelika Christ
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - Ivon Harliwong
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - Felicity Newell
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - Timothy J. C. Bruxner
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - David Miller
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
| | - Nicole Cloonan
- QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, QLD, 4006, Australia
| | - Sean M. Grimmond
- Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia
- Translational Research Centre, University of Glasgow, Glasgow, Scotland
| |
Collapse
|
277
|
Teixeira JC, de Filippo C, Weihmann A, Meneu JR, Racimo F, Dannemann M, Nickel B, Fischer A, Halbwax M, Andre C, Atencia R, Meyer M, Parra G, Pääbo S, Andrés AM. Long-Term Balancing Selection in LAD1 Maintains a Missense Trans-Species Polymorphism in Humans, Chimpanzees, and Bonobos. Mol Biol Evol 2015; 32:1186-96. [PMID: 25605789 DOI: 10.1093/molbev/msv007] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Balancing selection maintains advantageous genetic and phenotypic diversity in populations. When selection acts for long evolutionary periods selected polymorphisms may survive species splits and segregate in present-day populations of different species. Here, we investigate the role of long-term balancing selection in the evolution of protein-coding sequences in the Homo-Pan clade. We sequenced the exome of 20 humans, 20 chimpanzees, and 20 bonobos and detected eight coding trans-species polymorphisms (trSNPs) that are shared among the three species and have segregated for approximately 14 My of independent evolution. Although the majority of these trSNPs were found in three genes of the major histocompatibility locus cluster, we also uncovered one coding trSNP (rs12088790) in the gene LAD1. All these trSNPs show clustering of sequences by allele rather than by species and also exhibit other signatures of long-term balancing selection, such as segregating at intermediate frequency and lying in a locus with high genetic diversity. Here, we focus on the trSNP in LAD1, a gene that encodes for Ladinin-1, a collagenous anchoring filament protein of basement membrane that is responsible for maintaining cohesion at the dermal-epidermal junction; the gene is also an autoantigen responsible for linear IgA disease. This trSNP results in a missense change (Leucine257Proline) and, besides altering the protein sequence, is associated with changes in gene expression of LAD1.
Collapse
Affiliation(s)
- João C Teixeira
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Cesare de Filippo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Antje Weihmann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Juan R Meneu
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Fernando Racimo
- Department of Integrative Biology, University of California, Berkeley
| | - Michael Dannemann
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Birgit Nickel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anne Fischer
- International Center for Insect Physiology and Ecology, Nairobi, Kenya
| | - Michel Halbwax
- Clinique vétérinaire du Dr. Jacquemin, Maisons-Alfort, France
| | - Claudine Andre
- Lola Ya Bonobo sanctuary, Kinshasa, Democratic Republic Congo
| | - Rebeca Atencia
- Réserve Naturelle Sanctuaire à Chimpanzés de Tchimpounga, Jane Goodall Institute, Pointe-Noire, Republic of Congo
| | - Matthias Meyer
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Genís Parra
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Svante Pääbo
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Aida M Andrés
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
278
|
Ji Y, Marra NJ, DeWoody JA. Comparative analysis of active retrotransposons in the transcriptomes of three species of heteromyid rodents. Gene 2015; 562:95-106. [DOI: 10.1016/j.gene.2015.02.058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 02/16/2015] [Accepted: 02/17/2015] [Indexed: 10/24/2022]
|
279
|
Cebola I, Rodríguez-Seguí SA, Cho CHH, Bessa J, Rovira M, Luengo M, Chhatriwala M, Berry A, Ponsa-Cobas J, Maestro MA, Jennings RE, Pasquali L, Morán I, Castro N, Hanley NA, Gomez-Skarmeta JL, Vallier L, Ferrer J. TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors. Nat Cell Biol 2015; 17:615-626. [PMID: 25915126 PMCID: PMC4434585 DOI: 10.1038/ncb3160] [Citation(s) in RCA: 163] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Accepted: 03/13/2015] [Indexed: 02/02/2023]
Abstract
The genomic regulatory programmes that underlie human organogenesis are poorly understood. Pancreas development, in particular, has pivotal implications for pancreatic regeneration, cancer and diabetes. We have now characterized the regulatory landscape of embryonic multipotent progenitor cells that give rise to all pancreatic epithelial lineages. Using human embryonic pancreas and embryonic-stem-cell-derived progenitors we identify stage-specific transcripts and associated enhancers, many of which are co-occupied by transcription factors that are essential for pancreas development. We further show that TEAD1, a Hippo signalling effector, is an integral component of the transcription factor combinatorial code of pancreatic progenitor enhancers. TEAD and its coactivator YAP activate key pancreatic signalling mediators and transcription factors, and regulate the expansion of pancreatic progenitors. This work therefore uncovers a central role for TEAD and YAP as signal-responsive regulators of multipotent pancreatic progenitors, and provides a resource for the study of embryonic development of the human pancreas.
Collapse
Affiliation(s)
- Inês Cebola
- Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Santiago A. Rodríguez-Seguí
- Genomic Programming of Beta-cells Laboratory, Institut d’Investigacions August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 08036 Barcelona, Spain
- Laboratorio de Fisiología y Biología Molecular, Departamento de Fisiología, Biología Molecular y Celular, IFIBYNE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA Buenos Aires, Argentina
| | - Candy H.-H. Cho
- Wellcome Trust and MRC Stem Cells Centre, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery and Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0SZ, United Kingdom
| | - José Bessa
- Instituto de Biologia Molecular e Celular (IBMC), 4150-180 Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal
| | - Meritxell Rovira
- Genomic Programming of Beta-cells Laboratory, Institut d’Investigacions August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 08036 Barcelona, Spain
| | - Mario Luengo
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - Mariya Chhatriwala
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Andrew Berry
- Centre for Endocrinology and Diabetes, Institute of Human Development, Faculty of Medical & Human Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Joan Ponsa-Cobas
- Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Miguel Angel Maestro
- Genomic Programming of Beta-cells Laboratory, Institut d’Investigacions August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 08036 Barcelona, Spain
| | - Rachel E. Jennings
- Centre for Endocrinology and Diabetes, Institute of Human Development, Faculty of Medical & Human Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Lorenzo Pasquali
- Genomic Programming of Beta-cells Laboratory, Institut d’Investigacions August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 08036 Barcelona, Spain
| | - Ignasi Morán
- Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Natalia Castro
- Genomic Programming of Beta-cells Laboratory, Institut d’Investigacions August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 08036 Barcelona, Spain
| | - Neil A. Hanley
- Centre for Endocrinology and Diabetes, Institute of Human Development, Faculty of Medical & Human Sciences, Manchester Academic Health Sciences Centre, University of Manchester, Manchester M13 9PT, United Kingdom
- Endocrinology Department, Central Manchester University Hospitals NHS Foundation Trust, Manchester M13 9WU, United Kingdom
| | - Jose Luis Gomez-Skarmeta
- Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - Ludovic Vallier
- Wellcome Trust and MRC Stem Cells Centre, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery and Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0SZ, United Kingdom
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jorge Ferrer
- Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
- Genomic Programming of Beta-cells Laboratory, Institut d’Investigacions August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain
- CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), 08036 Barcelona, Spain
| |
Collapse
|
280
|
Walker BA, Wardell CP, Murison A, Boyle EM, Begum DB, Dahir NM, Proszek PZ, Melchor L, Pawlyn C, Kaiser MF, Johnson DC, Qiang YW, Jones JR, Cairns DA, Gregory WM, Owen RG, Cook G, Drayson MT, Jackson GH, Davies FE, Morgan GJ. APOBEC family mutational signatures are associated with poor prognosis translocations in multiple myeloma. Nat Commun 2015; 6:6997. [PMID: 25904160 PMCID: PMC4568299 DOI: 10.1038/ncomms7997] [Citation(s) in RCA: 223] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 03/24/2015] [Indexed: 12/12/2022] Open
Abstract
We have sequenced 463 presenting cases of myeloma entered into the UK Myeloma XI study using whole exome sequencing. Here we identify mutations induced as a consequence of misdirected AID in the partner oncogenes of IGH translocations, which are activating and associated with impaired clinical outcome. An APOBEC mutational signature is seen in 3.8% of cases and is linked to the translocation-mediated deregulation of MAF and MAFB, a known poor prognostic factor. Patients with this signature have an increased mutational load and a poor prognosis. Loss of MAF or MAFB expression results in decreased APOBEC3B and APOBEC4 expression, indicating a transcriptional control mechanism. Kataegis, a further mutational pattern associated with APOBEC deregulation, is seen at the sites of the MYC translocation. The APOBEC mutational signature seen in myeloma is, therefore, associated with poor prognosis primary and secondary translocations and the molecular mechanisms involved in generating them.
Collapse
Affiliation(s)
- Brian A Walker
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Christopher P Wardell
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Alex Murison
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Eileen M Boyle
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Dil B Begum
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Nasrin M Dahir
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Paula Z Proszek
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Lorenzo Melchor
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Charlotte Pawlyn
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Martin F Kaiser
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - David C Johnson
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - Ya-Wei Qiang
- Myeloma Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, USA
| | - John R Jones
- Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK
| | - David A Cairns
- Clinical Trials Research Unit, University of Leeds, Leeds LS2 9JT, UK
| | - Walter M Gregory
- Clinical Trials Research Unit, University of Leeds, Leeds LS2 9JT, UK
| | - Roger G Owen
- St James's University Hospital, University of Leeds, Leeds LS2 9JT, UK
| | - Gordon Cook
- St James's University Hospital, University of Leeds, Leeds LS2 9JT, UK
| | - Mark T Drayson
- Clinical Immunology, School of Immunity &Infection, University of Birmingham, Birmingham B15 2TT, UK
| | - Graham H Jackson
- Department of Haematology, Newcastle University, Newcastle-Upon-Tyne NE1 7RU, UK
| | - Faith E Davies
- 1] Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK [2] Myeloma Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, USA
| | - Gareth J Morgan
- 1] Division of Molecular Pathology, The Institute of Cancer Research, London SM2 5NG, UK [2] Myeloma Institute, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, USA
| |
Collapse
|
281
|
Xue Y, Prado-Martinez J, Sudmant PH, Narasimhan V, Ayub Q, Szpak M, Frandsen P, Chen Y, Yngvadottir B, Cooper DN, de Manuel M, Hernandez-Rodriguez J, Lobon I, Siegismund HR, Pagani L, Quail MA, Hvilsom C, Mudakikwa A, Eichler EE, Cranfield MR, Marques-Bonet T, Tyler-Smith C, Scally A. Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science 2015; 348:242-245. [PMID: 25859046 PMCID: PMC4668944 DOI: 10.1126/science.aaa3952] [Citation(s) in RCA: 243] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 03/03/2015] [Indexed: 12/30/2022]
Abstract
Mountain gorillas are an endangered great ape subspecies and a prominent focus for conservation, yet we know little about their genomic diversity and evolutionary past. We sequenced whole genomes from multiple wild individuals and compared the genomes of all four Gorilla subspecies. We found that the two eastern subspecies have experienced a prolonged population decline over the past 100,000 years, resulting in very low genetic diversity and an increased overall burden of deleterious variation. A further recent decline in the mountain gorilla population has led to extensive inbreeding, such that individuals are typically homozygous at 34% of their sequence, leading to the purging of severely deleterious recessive mutations from the population. We discuss the causes of their decline and the consequences for their future survival.
Collapse
Affiliation(s)
- Yali Xue
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Javier Prado-Martinez
- Institut de Biologia Evolutiva (CSIC/UPF), Parque de Investigación Biomédica de Barcelona (PRBB), Barcelona, Catalonia 08003, Spain
| | - Peter H. Sudmant
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Vagheesh Narasimhan
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 0WA, UK
| | - Qasim Ayub
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Michal Szpak
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Peter Frandsen
- Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Yuan Chen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Bryndis Yngvadottir
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - David N. Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff CF14 4XN, UK
| | - Marc de Manuel
- Institut de Biologia Evolutiva (CSIC/UPF), Parque de Investigación Biomédica de Barcelona (PRBB), Barcelona, Catalonia 08003, Spain
| | - Jessica Hernandez-Rodriguez
- Institut de Biologia Evolutiva (CSIC/UPF), Parque de Investigación Biomédica de Barcelona (PRBB), Barcelona, Catalonia 08003, Spain
| | - Irene Lobon
- Institut de Biologia Evolutiva (CSIC/UPF), Parque de Investigación Biomédica de Barcelona (PRBB), Barcelona, Catalonia 08003, Spain
| | - Hans R. Siegismund
- Department of Biology, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Luca Pagani
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
- Department of Biological, Geological and Environmental Sciences, University of Bologna, 40134 Bologna, Italy
| | - Michael A. Quail
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Christina Hvilsom
- Research and Conservation, Copenhagen Zoo, DK-2000 Frederiksberg, Denmark
| | | | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 91895, USA
| | - Michael R. Cranfield
- Gorilla Doctors, Karen C. Drayer Wildlife Health Center, University of California, Davis, CA 95616, USA
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva (CSIC/UPF), Parque de Investigación Biomédica de Barcelona (PRBB), Barcelona, Catalonia 08003, Spain
- Centro Nacional de Análisis Genómico (Parc Cientific de Barcelona), Baldiri Reixac 4, 08028 Barcelona, Spain
| | - Chris Tyler-Smith
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Aylwyn Scally
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| |
Collapse
|
282
|
St Laurent G, Wahlestedt C, Kapranov P. The Landscape of long noncoding RNA classification. Trends Genet 2015; 31:239-51. [PMID: 25869999 DOI: 10.1016/j.tig.2015.03.007] [Citation(s) in RCA: 836] [Impact Index Per Article: 92.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/09/2015] [Accepted: 03/12/2015] [Indexed: 12/12/2022]
Abstract
Advances in the depth and quality of transcriptome sequencing have revealed many new classes of long noncoding RNAs (lncRNAs). lncRNA classification has mushroomed to accommodate these new findings, even though the real dimensions and complexity of the noncoding transcriptome remain unknown. Although evidence of functionality of specific lncRNAs continues to accumulate, conflicting, confusing, and overlapping terminology has fostered ambiguity and lack of clarity in the field in general. The lack of fundamental conceptual unambiguous classification framework results in a number of challenges in the annotation and interpretation of noncoding transcriptome data. It also might undermine integration of the new genomic methods and datasets in an effort to unravel the function of lncRNA. Here, we review existing lncRNA classifications, nomenclature, and terminology. Then, we describe the conceptual guidelines that have emerged for their classification and functional annotation based on expanding and more comprehensive use of large systems biology-based datasets.
Collapse
Affiliation(s)
- Georges St Laurent
- St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA; Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, 185 Meeting Street, Providence, RI 02912, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1501 NW 10th Ave, Miami, FL 33136 USA.
| | - Philipp Kapranov
- Institute of Genomics, School of Biomedical Sciences, Huaqiao Univerisity, 668 Jimei Road, Xiamen, China 361021; St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA.
| |
Collapse
|
283
|
Leng N, Li Y, McIntosh BE, Nguyen BK, Duffin B, Tian S, Thomson JA, Dewey CN, Stewart R, Kendziorski C. EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments. Bioinformatics 2015; 31:2614-22. [PMID: 25847007 PMCID: PMC4528625 DOI: 10.1093/bioinformatics/btv193] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 03/30/2015] [Indexed: 01/08/2023] Open
Abstract
Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data. Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression. Availability and implementation: An R package containing examples and sample datasets is available at Bioconductor. Contact:kendzior@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ning Leng
- Department of Statistics, University of Wisconsin, Madison, WI, USA, Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | - Yuan Li
- Department of Statistics, University of Wisconsin, Madison, WI, USA
| | - Brian E McIntosh
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | - Bao Kim Nguyen
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | - Bret Duffin
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | - Shulan Tian
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | - James A Thomson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA, Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA, Department of Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA, USA and
| | - Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - Ron Stewart
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | - Christina Kendziorski
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| |
Collapse
|
284
|
Deelen P, Zhernakova DV, de Haan M, van der Sijde M, Bonder MJ, Karjalainen J, van der Velde KJ, Abbott KM, Fu J, Wijmenga C, Sinke RJ, Swertz MA, Franke L. Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels. Genome Med 2015; 7:30. [PMID: 25954321 PMCID: PMC4423486 DOI: 10.1186/s13073-015-0152-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Accepted: 03/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background RNA-sequencing (RNA-seq) is a powerful technique for the identification of genetic variants that affect gene-expression levels, either through expression quantitative trait locus (eQTL) mapping or through allele-specific expression (ASE) analysis. Given increasing numbers of RNA-seq samples in the public domain, we here studied to what extent eQTLs and ASE effects can be identified when using public RNA-seq data while deriving the genotypes from the RNA-sequencing reads themselves. Methods We downloaded the raw reads for all available human RNA-seq datasets. Using these reads we performed gene expression quantification. All samples were jointly normalized and subjected to a strict quality control. We also derived genotypes using the RNA-seq reads and used imputation to infer non-coding variants. This allowed us to perform eQTL mapping and ASE analyses jointly on all samples that passed quality control. Our results were validated using samples for which DNA-seq genotypes were available. Results 4,978 public human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though these data originated from many different laboratories, samples reflecting the same cell type clustered together, suggesting that technical biases due to different sequencing protocols are limited. In a joint analysis on the 1,262 samples with high quality genotypes, we identified cis-eQTLs effects for 8,034 unique genes (at a false discovery rate ≤0.05). eQTL mapping on individual tissues revealed that a limited number of samples already suffice to identify tissue-specific eQTLs for known disease-associated genetic variants. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Conclusions By deriving and imputing genotypes from RNA-seq data, it is possible to identify both eQTLs and ASE effects. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become especially relevant for studying the effects of tissue-specific and rare pathogenic genetic variants to aid clinical interpretation of exome and genome sequencing. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0152-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrick Deelen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Daria V Zhernakova
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Mark de Haan
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Marijke van der Sijde
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Marc Jan Bonder
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Juha Karjalainen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - K Joeri van der Velde
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Kristin M Abbott
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Jingyuan Fu
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Richard J Sinke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| | - Morris A Swertz
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands ; University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 RB Groningen, The Netherlands
| | - Lude Franke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 RB Groningen, The Netherlands
| |
Collapse
|
285
|
Vaqué JP, Martínez N, Varela I, Fernández F, Mayorga M, Derdak S, Beltrán S, Moreno T, Almaraz C, De las Heras G, Bayés M, Gut I, Crespo J, Piris MA. Colorectal adenomas contain multiple somatic mutations that do not coincide with synchronous adenocarcinoma specimens. PLoS One 2015; 10:e0119946. [PMID: 25775023 PMCID: PMC4361059 DOI: 10.1371/journal.pone.0119946] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2014] [Accepted: 01/22/2015] [Indexed: 11/18/2022] Open
Abstract
We have performed a comparative ultrasequencing study of multiple colorectal lesions obtained simultaneously from four patients. Our data show that benign lesions (adenomatous or hyperplastic polyps) contain a high mutational load. Additionally multiple synchronous colorectal lesions show non overlapping mutational signatures highlighting the degree of heterogeneity between multiple specimens in the same patient. Observations in these cases imply that considering not only the number of mutations but an effective oncogenic combination of mutations can determine the malignant progression of colorectal lesions.
Collapse
Affiliation(s)
- José P. Vaqué
- Cancer Genomics Group, IDIVAL, Instituto de Investigación Marqués de Valdecilla, Santander, Spain
- * E-mail:
| | - Nerea Martínez
- Cancer Genomics Group, IDIVAL, Instituto de Investigación Marqués de Valdecilla, Santander, Spain
| | - Ignacio Varela
- IBBTEC-UC-CSIC-SODERCAN Instituto de Biomedicina y Biotecnología de Cantabria, Santander, Spain
| | - Fidel Fernández
- Department of Pathology, Hospital Universitario Marqués de Valdecilla, Santander, Spain
| | - Marta Mayorga
- Department of Pathology, Hospital Universitario Marqués de Valdecilla, Santander, Spain
| | - Sophia Derdak
- Centro Nacional de Análisis Genómico, CNAG, Barcelona, Spain
| | - Sergi Beltrán
- Centro Nacional de Análisis Genómico, CNAG, Barcelona, Spain
| | - Thaidy Moreno
- IBBTEC-UC-CSIC-SODERCAN Instituto de Biomedicina y Biotecnología de Cantabria, Santander, Spain
| | - Carmen Almaraz
- Cancer Genomics Group, IDIVAL, Instituto de Investigación Marqués de Valdecilla, Santander, Spain
| | - Gonzalo De las Heras
- Gastroenterology and Hepatology Unit, Hospital Universitario Marqués de Valdecilla, Santander, Spain
| | - Mónica Bayés
- Centro Nacional de Análisis Genómico, CNAG, Barcelona, Spain
| | - Ivo Gut
- Centro Nacional de Análisis Genómico, CNAG, Barcelona, Spain
| | - Javier Crespo
- Gastroenterology and Hepatology Unit, Hospital Universitario Marqués de Valdecilla, Santander, Spain
- Infection, Immunity and Digestive Pathology Group, IFIMAV, Santander, Spain
| | - Miguel A. Piris
- Cancer Genomics Group, IDIVAL, Instituto de Investigación Marqués de Valdecilla, Santander, Spain
- Department of Pathology, Hospital Universitario Marqués de Valdecilla, Santander, Spain
| |
Collapse
|
286
|
Younkin SG, Scharpf RB, Schwender H, Parker MM, Scott AF, Marazita ML, Beaty TH, Ruczinski I. A genome-wide study of inherited deletions identified two regions associated with nonsyndromic isolated oral clefts. ACTA ACUST UNITED AC 2015; 103:276-83. [PMID: 25776870 DOI: 10.1002/bdra.23362] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND DNA copy number variants play an important part in the development of common birth defects such as oral clefts. Individual patients with multiple birth defects (including oral clefts) have been shown to carry small and large chromosomal deletions. METHODS We investigated the role of polymorphic copy number deletions by comparing transmission rates of deletions from parents to offspring in case-parent trios of European ancestry ascertained through a cleft proband with trios ascertained through a normal offspring. DNA copy numbers in trios were called using the joint hidden Markov model in the freely available PennCNV software. All statistical analyses were performed using Bioconductor tools in the open source environment R. RESULTS We identified a 67 kb region in the gene MGAM on chromosome 7q34, and a 206 kb region overlapping genes ADAM3A and ADAM5 on chromosome 8p11, where deletions are more frequently transmitted to cleft offspring than control offspring. CONCLUSIONS These genes or nearby regulatory elements may be involved in the etiology of oral clefts.
Collapse
Affiliation(s)
- Samuel G Younkin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore
| | | | | | | | | | | | | | | |
Collapse
|
287
|
Supek F, Lehner B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 2015; 521:81-4. [PMID: 25707793 PMCID: PMC4425546 DOI: 10.1038/nature14173] [Citation(s) in RCA: 230] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 12/19/2014] [Indexed: 12/26/2022]
Abstract
Cancer genome sequencing has revealed considerable variation in somatic mutation rates across the human genome, with mutation rates elevated in heterochromatic late replicating regions and reduced in early replicating euchromatin. Multiple mechanisms have been suggested to underlie this, but the actual cause is unknown. Here we identify variable DNA mismatch repair (MMR) as the basis of this variation. Analysing ∼17 million single-nucleotide variants from the genomes of 652 tumours, we show that regional autosomal mutation rates at megabase resolution are largely stable across cancer types, with differences related to changes in replication timing and gene expression. However, mutations arising after the inactivation of MMR are no longer enriched in late replicating heterochromatin relative to early replicating euchromatin. Thus, differential DNA repair and not differential mutation supply is the primary cause of the large-scale regional mutation rate variation across the human genome.
Collapse
Affiliation(s)
- Fran Supek
- 1] EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain [2] Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain [3] Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Ben Lehner
- 1] EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain [2] Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain [3] Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
288
|
Welch JD, Baran-Gale J, Perou CM, Sethupathy P, Prins JF. Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and ceRNA potential. BMC Genomics 2015; 16:113. [PMID: 25765044 PMCID: PMC4344757 DOI: 10.1186/s12864-015-1227-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 01/08/2015] [Indexed: 12/14/2022] Open
Abstract
Background Recent studies have shown that some pseudogenes are transcribed and contribute to cancer when dysregulated. In particular, pseudogene transcripts can function as competing endogenous RNAs (ceRNAs). The high similarity of gene and pseudogene nucleotide sequence has hindered experimental investigation of these mechanisms using RNA-seq. Furthermore, previous studies of pseudogenes in breast cancer have not integrated miRNA expression data in order to perform large-scale analysis of ceRNA potential. Thus, knowledge of both pseudogene ceRNA function and the role of pseudogene expression in cancer are restricted to isolated examples. Results To investigate whether transcribed pseudogenes play a pervasive regulatory role in cancer, we developed a novel bioinformatic method for measuring pseudogene transcription from RNA-seq data. We applied this method to 819 breast cancer samples from The Cancer Genome Atlas (TCGA) project. We then clustered the samples using pseudogene expression levels and integrated sample-paired pseudogene, gene and miRNA expression data with miRNA target prediction to determine whether more pseudogenes have ceRNA potential than expected by chance. Conclusions Our analysis identifies with high confidence a set of 440 pseudogenes that are transcribed in breast cancer tissue. Of this set, 309 pseudogenes exhibit significant differential expression among breast cancer subtypes. Hierarchical clustering using only pseudogene expression levels accurately separates tumor samples from normal samples and discriminates the Basal subtype from the Luminal and Her2 subtypes. Correlation analysis shows more positively correlated pseudogene-parent gene pairs and negatively correlated pseudogene-miRNA pairs than expected by chance. Furthermore, 177 transcribed pseudogenes possess binding sites for co-expressed miRNAs that are also predicted to target their parent genes. Taken together, these results increase the catalog of putative pseudogene ceRNAs and suggest that pseudogene transcription in breast cancer may play a larger role than previously appreciated. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1227-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joshua D Welch
- Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Jeanette Baran-Gale
- Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Charles M Perou
- Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Praveen Sethupathy
- Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Jan F Prins
- Curriculum in Bioinformatics and Computational Biology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA. .,Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
289
|
Cahill JA, Stirling I, Kistler L, Salamzade R, Ersmark E, Fulton TL, Stiller M, Green RE, Shapiro B. Genomic evidence of geographically widespread effect of gene flow from polar bears into brown bears. Mol Ecol 2015; 24:1205-17. [PMID: 25490862 PMCID: PMC4409089 DOI: 10.1111/mec.13038] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Revised: 11/15/2014] [Accepted: 11/26/2014] [Indexed: 12/16/2022]
Abstract
Polar bears are an arctic, marine adapted species that is closely related to brown bears. Genome analyses have shown that polar bears are distinct and genetically homogeneous in comparison to brown bears. However, these analyses have also revealed a remarkable episode of polar bear gene flow into the population of brown bears that colonized the Admiralty, Baranof and Chichagof islands (ABC islands) of Alaska. Here, we present an analysis of data from a large panel of polar bear and brown bear genomes that includes brown bears from the ABC islands, the Alaskan mainland and Europe. Our results provide clear evidence that gene flow between the two species had a geographically wide impact, with polar bear DNA found within the genomes of brown bears living both on the ABC islands and in the Alaskan mainland. Intriguingly, while brown bear genomes contain up to 8.8% polar bear ancestry, polar bear genomes appear to be devoid of brown bear ancestry, suggesting the presence of a barrier to gene flow in that direction.
Collapse
Affiliation(s)
- James A Cahill
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, 95064, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
290
|
Brynildsrud O, Snipen LG, Bohlin J. CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics 2015; 31:1708-15. [DOI: 10.1093/bioinformatics/btv070] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 01/28/2015] [Indexed: 01/22/2023] Open
|
291
|
Lagging-strand replication shapes the mutational landscape of the genome. Nature 2015; 518:502-506. [PMID: 25624100 PMCID: PMC4374164 DOI: 10.1038/nature14183] [Citation(s) in RCA: 168] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 01/05/2015] [Indexed: 12/21/2022]
Abstract
The origin of mutations is central to understanding evolution and of key relevance to health. Variation occurs non-randomly across the genome, and mechanisms for this remain to be defined. Here, we report that the 5′-ends of Okazaki fragments have significantly elevated levels of nucleotide substitution, indicating a replicative origin for such mutations. With a novel method, emRiboSeq, we map the genome-wide contribution of polymerases, and show that despite Okazaki fragment processing, DNA synthesised by error-prone Pol-α is retained in vivo, comprising ~1.5% of the mature genome. We propose that DNA-binding proteins that rapidly re-associate post-replication act as partial barriers to Pol-δ mediated displacement of Pol-α synthesised DNA, resulting in incorporation of such Pol-α tracts and elevated mutation rates at specific sites. We observe a mutational cost to chromatin and regulatory protein binding, resulting in mutation hotspots at regulatory elements, with signatures of this process detectable in both yeast and humans.
Collapse
|
292
|
Tsujimura T, Klein FA, Langenfeld K, Glaser J, Huber W, Spitz F. A discrete transition zone organizes the topological and regulatory autonomy of the adjacent tfap2c and bmp7 genes. PLoS Genet 2015; 11:e1004897. [PMID: 25569170 PMCID: PMC4288730 DOI: 10.1371/journal.pgen.1004897] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 11/17/2014] [Indexed: 12/11/2022] Open
Abstract
Despite the well-documented role of remote enhancers in controlling developmental gene expression, the mechanisms that allocate enhancers to genes are poorly characterized. Here, we investigate the cis-regulatory organization of the locus containing the Tfap2c and Bmp7 genes in vivo, using a series of engineered chromosomal rearrangements. While these genes lie adjacent to one another, we demonstrate that they are independently regulated by distinct sets of enhancers, which in turn define non-overlapping regulatory domains. Chromosome conformation capture experiments reveal a corresponding partition of the locus in two distinct structural entities, demarcated by a discrete transition zone. The impact of engineered chromosomal rearrangements on the topology of the locus and the resultant gene expression changes indicate that this transition zone functionally organizes the structural partition of the locus, thereby defining enhancer-target gene allocation. This partition is, however, not absolute: we show that it allows competing interactions across it that may be non-productive for the competing gene, but modulate expression of the competed one. Altogether, these data highlight the prime role of the topological organization of the genome in long-distance regulation of gene expression. The specificity of enhancer-gene interactions is fundamental to the execution of gene regulatory programs underpinning embryonic development and cell differentiation. However, our understanding of the mechanisms conferring specificity to enhancers and target gene interactions is limited. In this study, we characterize the cis-regulatory organization of a large genomic locus consisting of two developmental genes, Tfap2c and Bmp7. We show that this locus is structurally partitioned into two distinct domains by the constitutive action of a discrete transition zone located between the two genes. This separation restricts selectively the functional action of enhancers to the genes present within the same domain. Interestingly, the effects of this region as a boundary are relative, as it allows some competing interactions to take place across domains. We show that these interactions modulate the functional output of a brain enhancer on its primary target gene resulting in the spatial restriction of its expression domain. These results support a functional link between topological chromatin domains and allocation of enhancers to genes. They further show that a precise adjustment of chromatin interaction levels fine-tunes gene regulation by long-range enhancers.
Collapse
Affiliation(s)
- Taro Tsujimura
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Felix A. Klein
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Katja Langenfeld
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Juliane Glaser
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Wolfgang Huber
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - François Spitz
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- * E-mail:
| |
Collapse
|
293
|
Meng FL, Du Z, Federation A, Hu J, Wang Q, Kieffer-Kwon KR, Meyers RM, Amor C, Wasserman CR, Neuberg D, Casellas R, Nussenzweig MC, Bradner JE, Liu XS, Alt FW. Convergent transcription at intragenic super-enhancers targets AID-initiated genomic instability. Cell 2014; 159:1538-48. [PMID: 25483776 PMCID: PMC4322776 DOI: 10.1016/j.cell.2014.11.014] [Citation(s) in RCA: 190] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 10/01/2014] [Accepted: 10/27/2014] [Indexed: 01/08/2023]
Abstract
Activation-induced cytidine deaminase (AID) initiates both somatic hypermutation (SHM) for antibody affinity maturation and DNA breakage for antibody class switch recombination (CSR) via transcription-dependent cytidine deamination of single-stranded DNA targets. Though largely specific for immunoglobulin genes, AID also acts on a limited set of off-targets, generating oncogenic translocations and mutations that contribute to B cell lymphoma. How AID is recruited to off-targets has been a long-standing mystery. Based on deep GRO-seq studies of mouse and human B lineage cells activated for CSR or SHM, we report that most robust AID off-target translocations occur within highly focal regions of target genes in which sense and antisense transcription converge. Moreover, we found that such AID-targeting "convergent" transcription arises from antisense transcription that emanates from super-enhancers within sense transcribed gene bodies. Our findings provide an explanation for AID off-targeting to a small subset of mostly lineage-specific genes in activated B cells.
Collapse
Affiliation(s)
- Fei-Long Meng
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Zhou Du
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Alexander Federation
- Department of Medical Oncology, Dana-Farber Cancer Institute, and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Jiazhi Hu
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Qiao Wang
- Howard Hughes Medical Institute, Laboratory of Molecular Immunology, The Rockefeller University, New York, NY 10065, USA
| | - Kyong-Rim Kieffer-Kwon
- Genomics and Immunity, NIAMS, and Center of Cancer Research, NCI, National Institutes of Health, Bethesda, MD 20892, USA
| | - Robin M Meyers
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Corina Amor
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Caitlyn R Wasserman
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Donna Neuberg
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA 02115, USA
| | - Rafael Casellas
- Genomics and Immunity, NIAMS, and Center of Cancer Research, NCI, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michel C Nussenzweig
- Howard Hughes Medical Institute, Laboratory of Molecular Immunology, The Rockefeller University, New York, NY 10065, USA
| | - James E Bradner
- Department of Medical Oncology, Dana-Farber Cancer Institute, and Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA 02115, USA
| | - Frederick W Alt
- Howard Hughes Medical Institute, Program in Cellular and Molecular Medicine, Boston Children's Hospital, and Department of Genetics, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
294
|
Bergval I, Coll F, Schuitema A, de Ronde H, Mallard K, Pain A, McNerney R, Clark TG, Anthony RM. A proportion of mutations fixed in the genomes of in vitro selected isogenic drug-resistant Mycobacterium tuberculosis mutants can be detected as minority variants in the parent culture. FEMS Microbiol Lett 2014; 362:1-7. [PMID: 25670707 DOI: 10.1093/femsle/fnu037] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
We studied genomic variation in a previously selected collection of isogenic Mycobacterium tuberculosis laboratory strains subjected to one or two rounds of antibiotic selection. Whole genome sequencing analysis identified eleven single, unique mutations (four synonymous, six non-synonymous, one intergenic), in addition to drug resistance-conferring mutations, that were fixed in the genomes of six monoresistant strains. Eight loci, present as minority variants (five non-synonymous, three synonymous) in the genome of the susceptible parent strain, became fixed in the genomes of multiple daughter strains. None of these mutations are known to be involved with drug resistance. Our results confirm previously observed genomic stability for M. tuberculosis, although the parent strain had accumulated allelic variants at multiple locations in an antibiotic-free in vitro environment. It is therefore likely to assume that these so-called hitchhiking mutations were co-selected and fixed in multiple daughter strains during antibiotic selection. The presence of multiple allelic variations, accumulated under non-selective conditions, which become fixed during subsequent selective steps, deserves attention. The wider availability of 'deep' sequencing methods could help to detect multiple bacterial (sub)populations within patients with high resolution and would therefore be useful in assisting in the detailed investigation of transmission chains.
Collapse
Affiliation(s)
- Indra Bergval
- KIT Biomedical Research, Royal Tropical Institute, Meibergdreef 39, 1105 AZ Amsterdam, Netherlands
| | - Francesc Coll
- London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Anja Schuitema
- KIT Biomedical Research, Royal Tropical Institute, Meibergdreef 39, 1105 AZ Amsterdam, Netherlands
| | - Hans de Ronde
- KIT Biomedical Research, Royal Tropical Institute, Meibergdreef 39, 1105 AZ Amsterdam, Netherlands
| | - Kim Mallard
- London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Arnab Pain
- King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Ruth McNerney
- London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Taane G Clark
- London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Richard M Anthony
- KIT Biomedical Research, Royal Tropical Institute, Meibergdreef 39, 1105 AZ Amsterdam, Netherlands
| |
Collapse
|
295
|
Ma W, Ay F, Lee C, Gulsoy G, Deng X, Cook S, Hesson J, Cavanaugh C, Ware CB, Krumm A, Shendure J, Blau CA, Disteche CM, Noble WS, Duan Z. Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat Methods 2014; 12:71-8. [PMID: 25437436 PMCID: PMC4281301 DOI: 10.1038/nmeth.3205] [Citation(s) in RCA: 148] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 10/17/2014] [Indexed: 12/18/2022]
Abstract
High-throughput methods based on chromosome conformation capture have greatly advanced our understanding of the three-dimensional (3D) organization of genomes but are limited in resolution by their reliance on restriction enzymes. Here we describe a method called DNase Hi-C for comprehensively mapping global chromatin contacts. DNase Hi-C uses DNase I for chromatin fragmentation, leading to greatly improved efficiency and resolution over that of Hi-C. Coupling this method with DNA-capture technology provides a high-throughput approach for targeted mapping of fine-scale chromatin architecture. We applied targeted DNase Hi-C to characterize the 3D organization of 998 large intergenic noncoding RNA (lincRNA) promoters in two human cell lines. Our results revealed that expression of lincRNAs is tightly controlled by complex mechanisms involving both super-enhancers and the Polycomb repressive complex. Our results provide the first glimpse of the cell type-specific 3D organization of lincRNA genes.
Collapse
Affiliation(s)
- Wenxiu Ma
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Gunhan Gulsoy
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | - Savannah Cook
- 1] Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA. [2] Department of Comparative Medicine, University of Washington, Seattle, Washington, USA
| | - Jennifer Hesson
- 1] Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA. [2] Department of Comparative Medicine, University of Washington, Seattle, Washington, USA
| | - Christopher Cavanaugh
- 1] Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA. [2] Department of Comparative Medicine, University of Washington, Seattle, Washington, USA
| | - Carol B Ware
- 1] Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA. [2] Department of Comparative Medicine, University of Washington, Seattle, Washington, USA
| | - Anton Krumm
- Department of Radiation Oncology, University of Washington, Seattle, Washington, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Carl Anthony Blau
- 1] Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA. [2] Division of Hematology, University of Washington, Seattle, Washington, USA
| | | | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Zhijun Duan
- 1] Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA. [2] Division of Hematology, University of Washington, Seattle, Washington, USA
| |
Collapse
|
296
|
Abstract
Power-law distributions are the main functional form for the distribution of repeat size and repeat copy number in the human genome. When the genome is broken into fragments for sequencing, the limited size of fragments and reads may prevent an unique alignment of repeat sequences to the reference sequence. Repeats in the human genome can be as long as 104 bases, or 105 − 106 bases when allowing for mismatches between repeat units. Sequence reads from these regions are therefore unmappable when the read length is in the range of 103 bases. With a read length of 1000 bases, slightly more than 1% of the assembled genome, and slightly less than 1% of the 1 kb reads, are unmappable, excluding the unassembled portion of the human genome (8% in GRCh37/hg19). The slow decay (long tail) of the power-law function implies a diminishing return in converting unmappable regions/reads to become mappable with the increase of the read length, with the understanding that increasing read length will always move toward the direction of 100% mappability.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System Manhasset, NY, USA
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System Manhasset, NY, USA
| |
Collapse
|
297
|
Auerbach SS, Phadke DP, Mav D, Holmgren S, Gao Y, Xie B, Shin JH, Shah RR, Merrick BA, Tice RR. RNA-Seq-based toxicogenomic assessment of fresh frozen and formalin-fixed tissues yields similar mechanistic insights. J Appl Toxicol 2014; 35:766-80. [DOI: 10.1002/jat.3068] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 07/22/2014] [Accepted: 07/26/2014] [Indexed: 12/13/2022]
Affiliation(s)
- Scott S. Auerbach
- Biomolecular Screening Branch, Division of the National Toxicology Program; National Institute of Environmental Health Sciences; Research Triangle Park NC 27709 USA
| | | | | | - Stephanie Holmgren
- Library & Information Services Branch, Office of the Deputy Director; National Institute of Environmental Health Sciences; Research Triangle Park NC 27709 USA
| | - Yuan Gao
- Department of Biomedical Engineering; Johns Hopkins University; Baltimore MD 21205 USA
| | - Bin Xie
- Department of Biomedical Engineering; Johns Hopkins University; Baltimore MD 21205 USA
| | - Joo Heon Shin
- Department of Biomedical Engineering; Johns Hopkins University; Baltimore MD 21205 USA
| | | | - B. Alex Merrick
- Biomolecular Screening Branch, Division of the National Toxicology Program; National Institute of Environmental Health Sciences; Research Triangle Park NC 27709 USA
| | - Raymond R. Tice
- Biomolecular Screening Branch, Division of the National Toxicology Program; National Institute of Environmental Health Sciences; Research Triangle Park NC 27709 USA
| |
Collapse
|
298
|
Thung DT, Beulen L, Hehir-Kwa J, Faas BH. Implementation of whole genome massively parallel sequencing for noninvasive prenatal testing in laboratories. Expert Rev Mol Diagn 2014; 15:111-24. [PMID: 25347354 DOI: 10.1586/14737159.2015.973857] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Noninvasive prenatal testing (NIPT) for fetal aneuploidies using cell-free fetal DNA in maternal plasma has revolutionized the field of prenatal care and methods using massively parallel sequencing are now being implemented almost worldwide. Substantial progress has been made from initially testing for (an)euploidies of chromosomes 13, 18 and 21, to testing for sex chromosome (an)euploidies, additional autosomal aneuploidies as well as partial deletions and duplications genome-wide. Although NIPT is associated with significantly reduced risks for the fetus in comparison to existing invasive prenatal diagnostic methods, it presents several implementation challenges. Here, we review key issues potentially influencing NIPT and illustrate them using both data from literature and in-house data.
Collapse
|
299
|
Xu C, Zhang J, Wang YP, Deng HW, Li J. Characterization of human chromosomal material exchange with regard to the chromosome translocations using next-generation sequencing data. Genome Biol Evol 2014; 6:3015-24. [PMID: 25349267 PMCID: PMC4255766 DOI: 10.1093/gbe/evu234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
As an important subtype of structural variations, chromosomal translocation is associated with various diseases, especially cancers, by disrupting gene structures and functions. Traditional methods for identifying translocations are time consuming and have limited resolutions. Recently, a few studies have employed next-generation sequencing (NGS) technology for characterizing chromosomal translocations on human genome, obtaining high-throughput results with high resolutions. However, these studies are mainly focused on mechanism-specific or site-specific translocation mapping. In this study, we conducted a comprehensive genome-wide analysis on the characterization of human chromosomal material exchange with regard to the chromosome translocations. Using NGS data of 1,481 subjects from the 1000 Genomes Project, we identified 15,349,092 translocated DNA fragment pairs, ranging from 65 to 1,886 bp and with an average size of approximately 102 bp. On average, each individual genome carried about 10,364 pairs, covering approximately 0.069% of the genome. We identified 16 translocation hot regions, among which two regions did not contain repetitive fragments. Results of our study overlapped with a majority of previous results, containing approximately 79% of approximately 2,340 translocations characterized in three available translocation databases. In addition, our study identified five novel potential recurrent chromosomal material exchange regions with greater than 20% detection rates. Our results will be helpful for an accurate characterization of translocations in human genomes, and contribute as a resource for future studies of the roles of translocations in human disease etiology and mechanisms.
Collapse
Affiliation(s)
- Chao Xu
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University
| | - Yu-Ping Wang
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University Department of Biomedical Engineering, School of Science and Engineering, Tulane University
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University Third Affiliated Hospital, China Southern Medical University, Guang Zhou, 510000, P. R. China
| | - Jian Li
- Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University
| |
Collapse
|
300
|
Ho ED, Cao Q, Lee SD, Yip KY. VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants. BMC Genomics 2014; 15:886. [PMID: 25306238 PMCID: PMC4210471 DOI: 10.1186/1471-2164-15-886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/03/2014] [Indexed: 12/29/2022] Open
Abstract
Background High-throughput experimental methods have fostered the systematic detection of millions of genetic variants from any human genome. To help explore the potential biological implications of these genetic variants, software tools have been previously developed for integrating various types of information about these genomic regions from multiple data sources. Most of these tools were designed either for studying a small number of variants at a time, or for local execution on powerful machines. Results To make exploration of whole lists of genetic variants simple and accessible, we have developed a new Web-based system called VAS (Variant Annotation System, available at
https://yiplab.cse.cuhk.edu.hk/vas/). It provides a large variety of information useful for studying both coding and non-coding variants, including whole-genome transcription factor binding, open chromatin and transcription data from the ENCODE consortium. By means of data compression, millions of variants can be uploaded from a client machine to the server in less than 50 megabytes of data. On the server side, our customized data integration algorithms can efficiently link millions of variants with tens of whole-genome datasets. These two enabling technologies make VAS a practical tool for annotating genetic variants from large genomic studies. We demonstrate the use of VAS in annotating genetic variants obtained from a migraine meta-analysis study and multiple data sets from the Personal Genomes Project. We also compare the running time of annotating 6.4 million SNPs of the CEU trio by VAS and another tool, showing that VAS is efficient in handling new variant lists without requiring any pre-computations. Conclusions VAS is specially designed to handle annotation tasks with long lists of genetic variants and large numbers of annotating features efficiently. It is complementary to other existing tools with more specific aims such as evaluating the potential impacts of genetic variants in terms of disease risk. We recommend using VAS for a quick first-pass identification of potentially interesting genetic variants, to minimize the time required for other more in-depth downstream analyses.
Collapse
Affiliation(s)
| | | | | | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| |
Collapse
|