51
|
Polani S, Dean M, Lichter-Peled A, Hendrickson S, Tsang S, Fang X, Feng Y, Qiao W, Avni G, Kahila Bar-Gal G. Sequence Variant in the TRIM39-RPP21 Gene Readthrough is Shared Across a Cohort of Arabian Foals Diagnosed with Juvenile Idiopathic Epilepsy. JOURNAL OF GENETIC MUTATION DISORDERS 2022; 1:103. [PMID: 35465405 PMCID: PMC9031527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Juvenile idiopathic epilepsy (JIE) is a self-limiting neurological disorder with a suspected genetic predisposition affecting young Arabian foals of the Egyptian lineage. The condition is characterized by tonic-clonic seizures with intermittent post-ictal blindness, in which most incidents are sporadic and unrecognized. This study aimed to identify genetic components shared across a local cohort of Arabian foals diagnosed with JIE via a combined whole genome and targeted resequencing approach: Initial whole genome comparisons between a small cohort of nine diagnosed foals (cases) and 27 controls from other horse breeds identified variants uniquely shared amongst the case cohort. Further validation via targeted resequencing of these variants, that pertain to non-intergenic regions, on additional eleven case individuals revealed a single 19bp deletion coupled with a triple-C insertion (Δ19InsCCC) within the TRIM39-RPP21 gene readthrough that was uniquely shared across all case individuals, and absent from three additional Arabian controls. Furthermore, we have confirmed recent findings refuting potential linkage between JIE and other inherited diseases in the Arabian lineage, and refuted the potential linkage between JIE and genes predisposing a similar disorder in human newborns. This is the first study to report a genetic variant to be shared in a sub-population cohort of Arabian foals diagnosed with JIE. Further evaluation of the sensitivity and specificity of the Δ19InsCCC allele within additional cohorts of the Arabian horse is warranted in order to validate its credibility as a marker for JIE, and to ascertain whether it has been introduced into other horse breeds by Arabian ancestry.
Collapse
Affiliation(s)
- S Polani
- Koret School of Veterinary Medicine, The Robert H. Smith Faculty of Agriculture, Food and Environmental Sciences, The Hebrew University of Jerusalem, Rehovot, Israel
| | - M Dean
- National Cancer Institute, Division of Cancer Epidemiology & Genetics, Laboratory of Translational Genomics, USA
| | - A Lichter-Peled
- Koret School of Veterinary Medicine, The Robert H. Smith Faculty of Agriculture, Food and Environmental Sciences, The Hebrew University of Jerusalem, Rehovot, Israel
| | - S Hendrickson
- Department of Biology, Shepherd University, Shepherdstown, USA
| | | | - X Fang
- BGI-Shenzhen, Shenzhen, China
| | - Y Feng
- BGI-Shenzhen, Shenzhen, China
| | - W Qiao
- BGI-Shenzhen, Shenzhen, China
| | - G Avni
- Medisoos Equine Clinic, Kibutz Magal, Israel
| | - G Kahila Bar-Gal
- Koret School of Veterinary Medicine, The Robert H. Smith Faculty of Agriculture, Food and Environmental Sciences, The Hebrew University of Jerusalem, Rehovot, Israel
| |
Collapse
|
52
|
Wang R, Jiang Y. Copy Number Variation Detection by Single-Cell DNA Sequencing with SCOPE. Methods Mol Biol 2022; 2493:279-288. [PMID: 35751822 DOI: 10.1007/978-1-0716-2293-3_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Whole-genome single-cell DNA sequencing (scDNA-seq) enables the characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we describe SCOPE, a normalization and copy number estimation method for scDNA-seq data. We give an overview of the methodology and illustrate SCOPE with step-by-step demonstrations.
Collapse
Affiliation(s)
- Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA.
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
53
|
Iannucci A, Benazzo A, Natali C, Arida EA, Zein MSA, Jessop TS, Bertorelle G, Ciofi C. Population structure, genomic diversity and demographic history of Komodo dragons inferred from whole-genome sequencing. Mol Ecol 2021; 30:6309-6324. [PMID: 34390519 PMCID: PMC9292392 DOI: 10.1111/mec.16121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 07/28/2021] [Accepted: 08/03/2021] [Indexed: 02/07/2023]
Abstract
Population and conservation genetics studies have greatly benefited from the development of new techniques and bioinformatic tools associated with next-generation sequencing. Analysis of extensive data sets from whole-genome sequencing of even a few individuals allows the detection of patterns of fine-scale population structure and detailed reconstruction of demographic dynamics through time. In this study, we investigated the population structure, genomic diversity and demographic history of the Komodo dragon (Varanus komodoensis), the world's largest lizard, by sequencing the whole genomes of 24 individuals from the five main Indonesian islands comprising the entire range of the species. Three main genomic groups were observed. The populations of the Island of Komodo and the northern coast of Flores, in particular, were identified as two distinct conservation units. Degrees of genomic divergence among island populations were interpreted as a result of changes in sea level affecting connectivity across islands. Demographic inference suggested that Komodo dragons probably experienced a relatively steep population decline over the last million years, reaching a relatively stable Ne during the Saalian glacial cycle (400-150 thousand years ago) followed by a rapid Ne decrease. Genomic diversity of Komodo dragons was similar to that found in endangered or already extinct reptile species. Overall, this study provides an example of how whole-genome analysis of a few individuals per population can help define population structure and intraspecific demographic dynamics. This is particularly important when applying population genomics data to conservation of rare or elusive endangered species.
Collapse
Affiliation(s)
| | - Andrea Benazzo
- Department of Life Sciences and BiotechnologyUniversity of FerraraFerraraItaly
| | - Chiara Natali
- Department of BiologyUniversity of FlorenceFirenzeItaly
| | - Evy Ayu Arida
- Research Center for BiologyThe Indonesian Institute of Sciences (LIPI)Cibinong Science CenterCibinongIndonesia
| | - Moch Samsul Arifin Zein
- Research Center for BiologyThe Indonesian Institute of Sciences (LIPI)Cibinong Science CenterCibinongIndonesia
| | - Tim S. Jessop
- School of Life and Environmental SciencesDeakin UniversityGeelongVic.Australia
| | - Giorgio Bertorelle
- Department of Life Sciences and BiotechnologyUniversity of FerraraFerraraItaly
| | - Claudio Ciofi
- Department of BiologyUniversity of FlorenceFirenzeItaly
| |
Collapse
|
54
|
Le Page L, Gillespie A, Schwartz JC, Prawits LM, Schlerka A, Farrell CP, Hammond JA, Baldwin CL, Telfer JC, Hammer SE. Subpopulations of swine γδ T cells defined by TCRγ and WC1 gene expression. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2021; 125:104214. [PMID: 34329647 DOI: 10.1016/j.dci.2021.104214] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 07/24/2021] [Accepted: 07/24/2021] [Indexed: 06/13/2023]
Abstract
γδ T cells constitute a major portion of lymphocytes in the blood of both ruminants and swine. Subpopulations of swine γδ T cells have been distinguished by CD2 and CD8α expression. However, it was not clear if they have distinct expression profiles of their T-cell receptor (TCR) or WC1 genes. Identifying receptor expression will contribute to understanding the functional differences between these subpopulations and their contributions to immune protection. Here, we annotated three genomic assemblies of the swine TCRγ gene locus finding four gene cassettes containing C, J and V genes, although some haplotypes carried a null TRGC gene (TRGC4). Genes in the TRGC1 cassette were homologs of bovine TRGC5 cassette while the others were not homologous to bovine genes. Here we evaluated three principal populations of γδ T cells (CD2+/SWC5-, CD2-/SWC5+, and CD2-/SWC5-). Both CD2- subpopulations transcribed WC1 co-receptor genes, albeit with different patterns of gene expression but CD2+ cells did not. All subpopulations transcribed TCR genes from all four cassettes, although there were differences in expression levels. Finally, the CD2+ and CD2- γδ T-cell populations differed in their representation in various organs and tissues, presumably at least partially reflective of different ligand specificities for their receptors.
Collapse
Affiliation(s)
- Lauren Le Page
- Department of Veterinary & Animal Sciences, University of Massachusetts, Amherst, MA, USA
| | - Alexandria Gillespie
- Department of Veterinary & Animal Sciences, University of Massachusetts, Amherst, MA, USA
| | | | - Lisa-Maria Prawits
- Institute of Immunology, Department of Pathobiology, University of Veterinary Medicine, Vienna, Austria
| | - Angela Schlerka
- Institute of Immunology, Department of Pathobiology, University of Veterinary Medicine, Vienna, Austria
| | - Colin P Farrell
- Division of Hematology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | - Cynthia L Baldwin
- Department of Veterinary & Animal Sciences, University of Massachusetts, Amherst, MA, USA
| | - Janice C Telfer
- Department of Veterinary & Animal Sciences, University of Massachusetts, Amherst, MA, USA
| | - Sabine E Hammer
- Institute of Immunology, Department of Pathobiology, University of Veterinary Medicine, Vienna, Austria.
| |
Collapse
|
55
|
Montefiori LE, Bendig S, Gu Z, Chen X, Pölönen P, Ma X, Murison A, Zeng A, Garcia-Prat L, Dickerson K, Iacobucci I, Abdelhamed S, Hiltenbrand R, Mead PE, Mehr CM, Xu B, Cheng Z, Chang TC, Westover T, Ma J, Stengel A, Kimura S, Qu C, Valentine MB, Rashkovan M, Luger S, Litzow MR, Rowe JM, den Boer ML, Wang V, Yin J, Kornblau SM, Hunger SP, Loh ML, Pui CH, Yang W, Crews KR, Roberts KG, Yang JJ, Relling MV, Evans WE, Stock W, Paietta EM, Ferrando AA, Zhang J, Kern W, Haferlach T, Wu G, Dick JE, Klco JM, Haferlach C, Mullighan CG. Enhancer Hijacking Drives Oncogenic BCL11B Expression in Lineage-Ambiguous Stem Cell Leukemia. Cancer Discov 2021; 11:2846-2867. [PMID: 34103329 PMCID: PMC8563395 DOI: 10.1158/2159-8290.cd-21-0145] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/27/2021] [Accepted: 06/01/2021] [Indexed: 11/16/2022]
Abstract
Lineage-ambiguous leukemias are high-risk malignancies of poorly understood genetic basis. Here, we describe a distinct subgroup of acute leukemia with expression of myeloid, T lymphoid, and stem cell markers driven by aberrant allele-specific deregulation of BCL11B, a master transcription factor responsible for thymic T-lineage commitment and specification. Mechanistically, this deregulation was driven by chromosomal rearrangements that juxtapose BCL11B to superenhancers active in hematopoietic progenitors, or focal amplifications that generate a superenhancer from a noncoding element distal to BCL11B. Chromatin conformation analyses demonstrated long-range interactions of rearranged enhancers with the expressed BCL11B allele and association of BCL11B with activated hematopoietic progenitor cell cis-regulatory elements, suggesting BCL11B is aberrantly co-opted into a gene regulatory network that drives transformation by maintaining a progenitor state. These data support a role for ectopic BCL11B expression in primitive hematopoietic cells mediated by enhancer hijacking as an oncogenic driver of human lineage-ambiguous leukemia. SIGNIFICANCE: Lineage-ambiguous leukemias pose significant diagnostic and therapeutic challenges due to a poorly understood molecular and cellular basis. We identify oncogenic deregulation of BCL11B driven by diverse structural alterations, including de novo superenhancer generation, as the driving feature of a subset of lineage-ambiguous leukemias that transcend current diagnostic boundaries.This article is highlighted in the In This Issue feature, p. 2659.
Collapse
Affiliation(s)
- Lindsey E Montefiori
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | | | - Zhaohui Gu
- Department of Computational and Quantitative Medicine, City of Hope Comprehensive Cancer Center, Duarte, California
- Department of Systems Biology, City of Hope Comprehensive Cancer Center, Duarte, California
| | - Xiaolong Chen
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Petri Pölönen
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Xiaotu Ma
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Alex Murison
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Andy Zeng
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Laura Garcia-Prat
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Kirsten Dickerson
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Ilaria Iacobucci
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Sherif Abdelhamed
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Ryan Hiltenbrand
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Paul E Mead
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Cyrus M Mehr
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Beisi Xu
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Zhongshan Cheng
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Ti-Cheng Chang
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Tamara Westover
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Jing Ma
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | | | - Shunsuke Kimura
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Chunxu Qu
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Marcus B Valentine
- Cytogenetics Core Facility, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Marissa Rashkovan
- Institute for Cancer Genetics, Columbia University, New York, New York
| | - Selina Luger
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Mark R Litzow
- Division of Hematology, Department of Internal Medicine, Mayo Clinic, Rochester, Minnesota
| | - Jacob M Rowe
- Department of Hematology, Shaare Zedek Medical Center, Jerusalem, Israel
| | | | - Victoria Wang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Jun Yin
- Division of Clinical Trials and Biostatistics, Mayo Clinic, Rochester, Minnesota
| | - Steven M Kornblau
- Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Stephen P Hunger
- Department of Pediatrics, Children's Hospital of Philadelphia, and the Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Mignon L Loh
- Department of Pediatrics, Benioff Children's Hospital and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California
| | - Ching-Hon Pui
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Wenjian Yang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Kristine R Crews
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Kathryn G Roberts
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Jun J Yang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Mary V Relling
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - William E Evans
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Wendy Stock
- University of Chicago Comprehensive Cancer Center, Chicago, Illinois
| | | | - Adolfo A Ferrando
- Institute for Cancer Genetics, Columbia University, New York, New York
- Department of Pediatrics, Columbia University, New York, New York
- Department of Pathology and Cell Biology, Columbia University, New York, New York
- Department of Systems Biology, Columbia University, New York, New York
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | | | | | - Gang Wu
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee
- Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - John E Dick
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Jeffery M Klco
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee.
| | | | - Charles G Mullighan
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee.
| |
Collapse
|
56
|
Cerruti E, Gisbert C, Drost HG, Valentino D, Portis E, Barchi L, Prohens J, Lanteri S, Comino C, Catoni M. Grafting vigour is associated with DNA de-methylation in eggplant. HORTICULTURE RESEARCH 2021; 8:241. [PMID: 34719687 PMCID: PMC8558322 DOI: 10.1038/s41438-021-00660-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 07/20/2021] [Accepted: 07/30/2021] [Indexed: 05/08/2023]
Abstract
In horticulture, grafting is a popular technique used to combine positive traits from two different plants. This is achieved by joining the plant top part (scion) onto a rootstock which contains the stem and roots. Rootstocks can provide resistance to stress and increase plant production, but despite their wide use, the biological mechanisms driving rootstock-induced alterations of the scion phenotype remain largely unknown. Given that epigenetics plays a relevant role during distance signalling in plants, we studied the genome-wide DNA methylation changes induced in eggplant (Solanum melongena) scion using two interspecific rootstocks to increase vigour. We found that vigour was associated with a change in scion gene expression and a genome-wide hypomethylation in the CHH context. Interestingly, this hypomethylation correlated with the downregulation of younger and potentially more active long terminal repeat retrotransposable elements (LTR-TEs), suggesting that graft-induced epigenetic modifications are associated with both physiological and molecular phenotypes in grafted plants. Our results indicate that the enhanced vigour induced by heterografting in eggplant is associated with epigenetic modifications, as also observed in some heterotic hybrids.
Collapse
Affiliation(s)
- Elisa Cerruti
- Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy
- The Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Carmina Gisbert
- Institute for Conservation & Improvement of Valencian Agrodiversity (COMAV), Universitat Politècnica de València, Valencia, Spain
| | - Hajk-Georg Drost
- The Sainsbury Laboratory, University of Cambridge, Cambridge, UK
- Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Danila Valentino
- Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy
| | - Ezio Portis
- Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy
| | - Lorenzo Barchi
- Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy
| | - Jaime Prohens
- Institute for Conservation & Improvement of Valencian Agrodiversity (COMAV), Universitat Politècnica de València, Valencia, Spain
| | - Sergio Lanteri
- Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy
| | - Cinzia Comino
- Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy.
| | - Marco Catoni
- The Sainsbury Laboratory, University of Cambridge, Cambridge, UK.
- School of Biosciences, University of Birmingham, Birmingham, United Kingdom.
- Institute for Sustainable Plant Protection, National Research Council of Italy, Torino, Italy.
| |
Collapse
|
57
|
Bendixsen DP, Peris D, Stelkens R. Patterns of Genomic Instability in Interspecific Yeast Hybrids With Diverse Ancestries. FRONTIERS IN FUNGAL BIOLOGY 2021; 2:742894. [PMID: 37744091 PMCID: PMC10512264 DOI: 10.3389/ffunb.2021.742894] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/06/2021] [Indexed: 09/26/2023]
Abstract
The genomes of hybrids often show substantial deviations from the features of the parent genomes, including genomic instabilities characterized by chromosomal rearrangements, gains, and losses. This plastic genomic architecture generates phenotypic diversity, potentially giving hybrids access to new ecological niches. It is however unclear if there are any generalizable patterns and predictability in the type and prevalence of genomic variation and instability across hybrids with different genetic and ecological backgrounds. Here, we analyzed the genomic architecture of 204 interspecific Saccharomyces yeast hybrids isolated from natural, industrial fermentation, clinical, and laboratory environments. Synchronous mapping to all eight putative parental species showed significant variation in read depth indicating frequent aneuploidy, affecting 44% of all hybrid genomes and particularly smaller chromosomes. Early generation hybrids with largely equal genomic content from both parent species were more likely to contain aneuploidies than introgressed genomes with an older hybridization history, which presumably stabilized the genome. Shared k-mer analysis showed that the degree of genomic diversity and variability varied among hybrids with different parent species. Interestingly, more genetically distant crosses produced more similar hybrid genomes, which may be a result of stronger negative epistasis at larger genomic divergence, putting constraints on hybridization outcomes. Mitochondrial genomes were typically inherited from the species also contributing the majority nuclear genome, but there were clear exceptions to this rule. Together, we find reliable genomic predictors of instability in hybrids, but also report interesting cross- and environment-specific idiosyncrasies. Our results are an important step in understanding the factors shaping divergent hybrid genomes and their role in adaptive evolution.
Collapse
Affiliation(s)
- Devin P. Bendixsen
- Population Genetics Division, Department of Zoology, Stockholm University, Stockholm, Sweden
| | - David Peris
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
- Department of Health, Valencian International University, Valencia, Spain
| | - Rike Stelkens
- Population Genetics Division, Department of Zoology, Stockholm University, Stockholm, Sweden
| |
Collapse
|
58
|
Han S, Basting PJ, Dias GB, Luhur A, Zelhof AC, Bergman CM. Transposable element profiles reveal cell line identity and loss of heterozygosity in Drosophila cell culture. Genetics 2021; 219:6321957. [PMID: 34849875 PMCID: PMC8633141 DOI: 10.1093/genetics/iyab113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/01/2021] [Indexed: 11/28/2022] Open
Abstract
Cell culture systems allow key insights into biological mechanisms yet suffer from irreproducible outcomes in part because of cross-contamination or mislabeling of cell lines. Cell line misidentification can be mitigated by the use of genotyping protocols, which have been developed for human cell lines but are lacking for many important model species. Here, we leverage the classical observation that transposable elements (TEs) proliferate in cultured Drosophila cells to demonstrate that genome-wide TE insertion profiles can reveal the identity and provenance of Drosophila cell lines. We identify multiple cases where TE profiles clarify the origin of Drosophila cell lines (Sg4, mbn2, and OSS_E) relative to published reports, and also provide evidence that insertions from only a subset of long-terminal repeat retrotransposon families are necessary to mark Drosophila cell line identity. We also develop a new bioinformatics approach to detect TE insertions and estimate intra-sample allele frequencies in legacy whole-genome sequencing data (called ngs_te_mapper2), which revealed loss of heterozygosity as a mechanism shaping the unique TE profiles that identify Drosophila cell lines. Our work contributes to the general understanding of the forces impacting metazoan genomes as they evolve in cell culture and paves the way for high-throughput protocols that use TE insertions to authenticate cell lines in Drosophila and other organisms.
Collapse
Affiliation(s)
- Shunhua Han
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Preston J Basting
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Guilherme B Dias
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.,Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Arthur Luhur
- Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA.,Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Andrew C Zelhof
- Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA.,Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Casey M Bergman
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.,Department of Genetics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
59
|
Pitsava G, Feldkamp ML, Pankratz N, Lane J, Kay DM, Conway KM, Shaw GM, Reefhuis J, Jenkins MM, Almli LM, Olshan AF, Pangilinan F, Brody LC, Sicko RJ, Hobbs CA, Bamshad M, McGoldrick D, Nickerson DA, Finnell RH, Mullikin J, Romitti PA, Mills JL. Exome sequencing of child-parent trios with bladder exstrophy: Findings in 26 children. Am J Med Genet A 2021; 185:3028-3041. [PMID: 34355505 PMCID: PMC8446314 DOI: 10.1002/ajmg.a.62439] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 05/31/2021] [Accepted: 07/08/2021] [Indexed: 12/31/2022]
Abstract
Bladder exstrophy (BE) is a rare, lower ventral midline defect with the bladder and part of the urethra exposed. The etiology of BE is unknown but thought to be influenced by genetic variation with more recent studies suggesting a role for rare variants. As such, we conducted paired-end exome sequencing in 26 child/mother/father trios. Three children had rare (allele frequency ≤ 0.0001 in several public databases) inherited variants in TSPAN4, one with a loss-of-function variant and two with missense variants. Two children had loss-of-function variants in TUBE1. Four children had rare missense or nonsense variants (one per child) in WNT3, CRKL, MYH9, or LZTR1, genes previously associated with BE. We detected 17 de novo missense variants in 13 children and three de novo loss-of-function variants (AKR1C2, PRRX1, PPM1D) in three children (one per child). We also detected rare compound heterozygous loss-of-function variants in PLCH2 and CLEC4M and rare inherited missense or loss-of-function variants in additional genes applying autosomal recessive (three genes) and X-linked recessive inheritance models (13 genes). Variants in two genes identified may implicate disruption in cell migration (TUBE1) and adhesion (TSPAN4) processes, mechanisms proposed for BE, and provide additional evidence for rare variants in the development of this defect.
Collapse
Affiliation(s)
- Georgia Pitsava
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | - Marcia L. Feldkamp
- Division of Medical Genetics, Department of Pediatrics, 295 Chipeta Way, Suite 2S010, University of Utah School of Medicine, Salt Lake City, Utah
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, Minnesota
| | - John Lane
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, Minnesota
| | - Denise M. Kay
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, New York
| | - Kristin M. Conway
- Department of Epidemiology, College of Public Health, The University of Iowa, Iowa City, Iowa
| | - Gary M. Shaw
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
| | - Jennita Reefhuis
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Mary M. Jenkins
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Lynn M. Almli
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Andrew F. Olshan
- Department of Epidemiology, Gillings School of Global Public Health, Chapel Hill, North Carolina
| | - Faith Pangilinan
- Gene and Environment Interaction Section, National Human Genome Research Institute, Bethesda, Maryland
| | - Lawrence C. Brody
- Gene and Environment Interaction Section, National Human Genome Research Institute, Bethesda, Maryland
| | - Robert J. Sicko
- Division of Genetics, Wadsworth Center, New York State Department of Health, Albany, New York
| | | | - Mike Bamshad
- Department of Pediatrics, University of Washington, Seattle, Washington
| | - Daniel McGoldrick
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | | | - Richard H. Finnell
- Center for Precision Environmental Health, Baylor College of Medicine, Houston, Texas
| | - James Mullikin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Paul A. Romitti
- Department of Epidemiology, College of Public Health, The University of Iowa, Iowa City, Iowa
| | - James L. Mills
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland
| | | |
Collapse
|
60
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|
61
|
Buggiotti L, Yurchenko AA, Yudin NS, Vander Jagt CJ, Vorobieva NV, Kusliy MA, Vasiliev SK, Rodionov AN, Boronetskaya OI, Zinovieva NA, Graphodatsky AS, Daetwyler HD, Larkin DM. Demographic History, Adaptation, and NRAP Convergent Evolution at Amino Acid Residue 100 in the World Northernmost Cattle from Siberia. Mol Biol Evol 2021; 38:3093-3110. [PMID: 33784744 PMCID: PMC8321547 DOI: 10.1093/molbev/msab078] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Native cattle breeds represent an important cultural heritage. They are a reservoir of genetic variation useful for properly responding to agriculture needs in the light of ongoing climate changes. Evolutionary processes that occur in response to extreme environmental conditions could also be better understood using adapted local populations. Herein, different evolutionary histories of the world northernmost native cattle breeds from Russia were investigated. They highlighted Kholmogory as a typical taurine cattle, whereas Yakut cattle separated from European taurines approximately 5,000 years ago and contain numerous ancestral and some novel genetic variants allowing their adaptation to harsh conditions of living above the Polar Circle. Scans for selection signatures pointed to several common gene pathways related to adaptation to harsh climates in both breeds. But genes affected by selection from these pathways were mostly different. A Yakut cattle breed-specific missense mutation in a highly conserved NRAP gene represents a unique example of a young amino acid residue convergent change shared with at least 16 species of hibernating/cold-adapted mammals from six distinct phylogenetic orders. This suggests a convergent evolution event along the mammalian phylogenetic tree and fast fixation in a single isolated cattle population exposed to a harsh climate.
Collapse
Affiliation(s)
- Laura Buggiotti
- Royal Veterinary College, University of London, London, United Kingdom
| | - Andrey A Yurchenko
- The Federal Research Center Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk, Russia
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia
| | - Nikolay S Yudin
- The Federal Research Center Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk, Russia
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia
| | | | - Nadezhda V Vorobieva
- Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russia
| | - Mariya A Kusliy
- Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russia
| | - Sergei K Vasiliev
- Paleometal Archeology Department, Institute of Archaeology and Ethnography SB RAS, Novosibirsk, Russia
| | - Andrey N Rodionov
- L.K. Ernst Federal Research Centre for Animal Husbandry, Podolsk, Russia
| | - Oksana I Boronetskaya
- Moscow Agrarian Academy, Timiryazev Russian State Agrarian University, Moscow, Russia
| | | | - Alexander S Graphodatsky
- Department of the Diversity and Evolution of Genomes, Institute of Molecular and Cellular Biology SB RAS, Novosibirsk, Russia
| | - Hans D Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Denis M Larkin
- Royal Veterinary College, University of London, London, United Kingdom
- The Federal Research Center Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk, Russia
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia
| |
Collapse
|
62
|
Stephens Z, Milosevic D, Kipp B, Grebe S, Iyer RK, Kocher JPA. PB-Motif-A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping. Front Genet 2021; 12:716586. [PMID: 34394200 PMCID: PMC8355628 DOI: 10.3389/fgene.2021.716586] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 07/05/2021] [Indexed: 12/30/2022] Open
Abstract
Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.
Collapse
Affiliation(s)
- Zachary Stephens
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States
| | | | | | | | - Ravishankar K Iyer
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States
| | | |
Collapse
|
63
|
Fabry MH, Falconio FA, Joud F, Lythgoe EK, Czech B, Hannon GJ. Maternally inherited piRNAs direct transient heterochromatin formation at active transposons during early Drosophila embryogenesis. eLife 2021; 10:e68573. [PMID: 34236313 PMCID: PMC8352587 DOI: 10.7554/elife.68573] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 07/07/2021] [Indexed: 12/12/2022] Open
Abstract
The PIWI-interacting RNA (piRNA) pathway controls transposon expression in animal germ cells, thereby ensuring genome stability over generations. In Drosophila, piRNAs are intergenerationally inherited through the maternal lineage, and this has demonstrated importance in the specification of piRNA source loci and in silencing of I- and P-elements in the germ cells of daughters. Maternally inherited Piwi protein enters somatic nuclei in early embryos prior to zygotic genome activation and persists therein for roughly half of the time required to complete embryonic development. To investigate the role of the piRNA pathway in the embryonic soma, we created a conditionally unstable Piwi protein. This enabled maternally deposited Piwi to be cleared from newly laid embryos within 30 min and well ahead of the activation of zygotic transcription. Examination of RNA and protein profiles over time, and correlation with patterns of H3K9me3 deposition, suggests a role for maternally deposited Piwi in attenuating zygotic transposon expression in somatic cells of the developing embryo. In particular, robust deposition of piRNAs targeting roo, an element whose expression is mainly restricted to embryonic development, results in the deposition of transient heterochromatic marks at active roo insertions. We hypothesize that roo, an extremely successful mobile element, may have adopted a lifestyle of expression in the embryonic soma to evade silencing in germ cells.
Collapse
Affiliation(s)
- Martin H Fabry
- CRUK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Federica A Falconio
- CRUK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Fadwa Joud
- CRUK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Emily K Lythgoe
- CRUK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Benjamin Czech
- CRUK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Gregory J Hannon
- CRUK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| |
Collapse
|
64
|
Wang R, Lin DY, Jiang Y. SCOPE: A Normalization and Copy-Number Estimation Method for Single-Cell DNA Sequencing. Cell Syst 2021; 10:445-452.e6. [PMID: 32437686 DOI: 10.1016/j.cels.2020.03.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 02/11/2020] [Accepted: 03/26/2020] [Indexed: 01/01/2023]
Abstract
Whole-genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy-number profiles at the cellular level. We propose SCOPE, a normalization and copy-number estimation method for the noisy scDNA-seq data. SCOPE's main features include the following: (1) a Poisson latent factor model for normalization, which borrows information across cells and regions to estimate bias, using in silico identified negative control cells; (2) an expectation-maximization algorithm embedded in the normalization step, which accounts for the aberrant copy-number changes and allows direct ploidy estimation without the need for post hoc adjustment; and (3) a cross-sample segmentation procedure to identify breakpoints that are shared across cells with the same genetic background. We evaluate SCOPE on a diverse set of scDNA-seq data in cancer genomics and show that SCOPE offers accurate copy-number estimates and successfully reconstructs subclonal structure. A record of this paper's transparent peer review process is included in the Supplemental Information.
Collapse
Affiliation(s)
- Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Dan-Yu Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA; Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA; Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA; Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA.
| |
Collapse
|
65
|
Peneder P, Stütz AM, Surdez D, Krumbholz M, Semper S, Chicard M, Sheffield NC, Pierron G, Lapouble E, Tötzl M, Ergüner B, Barreca D, Rendeiro AF, Agaimy A, Boztug H, Engstler G, Dworzak M, Bernkopf M, Taschner-Mandl S, Ambros IM, Myklebost O, Marec-Bérard P, Burchill SA, Brennan B, Strauss SJ, Whelan J, Schleiermacher G, Schaefer C, Dirksen U, Hutter C, Boye K, Ambros PF, Delattre O, Metzler M, Bock C, Tomazou EM. Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden. Nat Commun 2021; 12:3230. [PMID: 34050156 PMCID: PMC8163828 DOI: 10.1038/s41467-021-23445-w] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 04/29/2021] [Indexed: 12/19/2022] Open
Abstract
Sequencing of cell-free DNA in the blood of cancer patients (liquid biopsy) provides attractive opportunities for early diagnosis, assessment of treatment response, and minimally invasive disease monitoring. To unlock liquid biopsy analysis for pediatric tumors with few genetic aberrations, we introduce an integrated genetic/epigenetic analysis method and demonstrate its utility on 241 deep whole-genome sequencing profiles of 95 patients with Ewing sarcoma and 31 patients with other pediatric sarcomas. Our method achieves sensitive detection and classification of circulating tumor DNA in peripheral blood independent of any genetic alterations. Moreover, we benchmark different metrics for cell-free DNA fragmentation analysis, and we introduce the LIQUORICE algorithm for detecting circulating tumor DNA based on cancer-specific chromatin signatures. Finally, we combine several fragmentation-based metrics into an integrated machine learning classifier for liquid biopsy analysis that exploits widespread epigenetic deregulation and is tailored to cancers with low mutation rates. Clinical associations highlight the potential value of cfDNA fragmentation patterns as prognostic biomarkers in Ewing sarcoma. In summary, our study provides a comprehensive analysis of circulating tumor DNA beyond recurrent genetic aberrations, and it renders the benefits of liquid biopsy more readily accessible for childhood cancers.
Collapse
Affiliation(s)
- Peter Peneder
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Adrian M Stütz
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Didier Surdez
- INSERM U830, Équipe Labellisée LNCC, PSL Research University, SIREDO Oncology Centre, Institut Curie Research Centre, Paris, France
- Balgrist University Hospital, University of Zurich, Zurich, Switzerland
| | - Manuela Krumbholz
- Department of Pediatrics, University Hospital Erlangen, Erlangen, Germany
| | - Sabine Semper
- Department of Pediatrics, University Hospital Erlangen, Erlangen, Germany
| | - Mathieu Chicard
- INSERM U830, Équipe Labellisée LNCC, PSL Research University, SIREDO Oncology Centre, Institut Curie Research Centre, Paris, France
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Gaelle Pierron
- Unité de Génétique Somatique, Service d'oncogénétique, Institut Curie, Centre Hospitalier, Paris, France
| | - Eve Lapouble
- Unité de Génétique Somatique, Service d'oncogénétique, Institut Curie, Centre Hospitalier, Paris, France
| | - Marcus Tötzl
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Bekir Ergüner
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Daniele Barreca
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - André F Rendeiro
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Abbas Agaimy
- Institute of Pathology, University Hospital Erlangen, Erlangen, Germany
| | - Heidrun Boztug
- St. Anna Kinderspital, Department of Pediatrics, Medical University, Vienna, Austria
| | - Gernot Engstler
- St. Anna Kinderspital, Department of Pediatrics, Medical University, Vienna, Austria
| | - Michael Dworzak
- St. Anna Kinderspital, Department of Pediatrics, Medical University, Vienna, Austria
| | - Marie Bernkopf
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | | | - Inge M Ambros
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Ola Myklebost
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Perrine Marec-Bérard
- Pediatric Department, Hematology and Oncology Pediatric Institute, Centre Léon Bérard, Lyon, France
| | - Susan Ann Burchill
- Children's Cancer Research Group, Leeds Institute of Medical Research, St. James's University Hospital, Leeds, UK
| | - Bernadette Brennan
- Department of Pediatric Oncology, Royal Manchester Children's Hospital, Manchester, UK
| | - Sandra J Strauss
- Department of Oncology, UCL Cancer Institute, London, UK
- Department of Oncology, University College London Hospital, London, UK
| | - Jeremy Whelan
- Department of Oncology, University College London Hospital, London, UK
| | - Gudrun Schleiermacher
- INSERM U830, Équipe Labellisée LNCC, PSL Research University, SIREDO Oncology Centre, Institut Curie Research Centre, Paris, France
| | - Christiane Schaefer
- University Hospital Essen, Pediatrics III, West German Cancer Centre, Essen, Germany
| | - Uta Dirksen
- University Hospital Essen, Pediatrics III, West German Cancer Centre, Essen, Germany
| | - Caroline Hutter
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
- St. Anna Kinderspital, Department of Pediatrics, Medical University, Vienna, Austria
| | - Kjetil Boye
- Department of Oncology, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, Norway
| | - Peter F Ambros
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Olivier Delattre
- INSERM U830, Équipe Labellisée LNCC, PSL Research University, SIREDO Oncology Centre, Institut Curie Research Centre, Paris, France
- Unité de Génétique Somatique, Service d'oncogénétique, Institut Curie, Centre Hospitalier, Paris, France
| | - Markus Metzler
- Department of Pediatrics, University Hospital Erlangen, Erlangen, Germany
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.
- Institute of Artificial Intelligence, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria.
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria.
| | - Eleni M Tomazou
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria.
| |
Collapse
|
66
|
Smolander J, Khan S, Singaravelu K, Kauko L, Lund RJ, Laiho A, Elo LL. Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data. BMC Genomics 2021; 22:357. [PMID: 34000988 PMCID: PMC8130438 DOI: 10.1186/s12864-021-07686-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 05/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005-0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. RESULT Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. CONCLUSIONS Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.
Collapse
Affiliation(s)
- Johannes Smolander
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Kalaimathy Singaravelu
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Leni Kauko
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Riikka J Lund
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Asta Laiho
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland.
- Institute of Biomedicine, University of Turku, 20520, Turku, Finland.
| |
Collapse
|
67
|
Shah RN, Ruthenburg AJ. Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads. PLoS Comput Biol 2021; 17:e1008926. [PMID: 33872311 PMCID: PMC8084338 DOI: 10.1371/journal.pcbi.1008926] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 04/29/2021] [Accepted: 03/30/2021] [Indexed: 11/18/2022] Open
Abstract
Next-generation sequencing (NGS) has transformed molecular biology and contributed to many seminal insights into genomic regulation and function. Apart from whole-genome sequencing, an NGS workflow involves alignment of the sequencing reads to the genome of study, after which the resulting alignments can be used for downstream analyses. However, alignment is complicated by the repetitive sequences; many reads align to more than one genomic locus, with 15-30% of the genome not being uniquely mappable by short-read NGS. This problem is typically addressed by discarding reads that do not uniquely map to the genome, but this practice can lead to systematic distortion of the data. Previous studies that developed methods for handling ambiguously mapped reads were often of limited applicability or were computationally intensive, hindering their broader usage. In this work, we present SmartMap: an algorithm that augments industry-standard aligners to enable usage of ambiguously mapped reads by assigning weights to each alignment with Bayesian analysis of the read distribution and alignment quality. SmartMap is computationally efficient, utilizing far fewer weighting iterations than previously thought necessary to process alignments and, as such, analyzing more than a billion alignments of NGS reads in approximately one hour on a desktop PC. By applying SmartMap to peak-type NGS data, including MNase-seq, ChIP-seq, and ATAC-seq in three organisms, we can increase read depth by up to 53% and increase the mapped proportion of the genome by up to 18% compared to analyses utilizing only uniquely mapped reads. We further show that SmartMap enables the analysis of more than 140,000 repetitive elements that could not be analyzed by traditional ChIP-seq workflows, and we utilize this method to gain insight into the epigenetic regulation of different classes of repetitive elements. These data emphasize both the dangers of discarding ambiguously mapped reads and their power for driving biological discovery.
Collapse
Affiliation(s)
- Rohan N. Shah
- Pritzker School of Medicine, Division of the Biological Sciences, The University of Chicago, Chicago, Illinois, United States of America
- Department of Molecular Biology and Cell Genetics, Division of the Biological Sciences, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (RNS); (AJR)
| | - Alexander J. Ruthenburg
- Department of Molecular Biology and Cell Genetics, Division of the Biological Sciences, The University of Chicago, Chicago, Illinois, United States of America
- Department of Biochemistry and Molecular Biology, Division of the Biological Sciences, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (RNS); (AJR)
| |
Collapse
|
68
|
Munafò M, Lawless VR, Passera A, MacMillan S, Bornelöv S, Haussmann IU, Soller M, Hannon GJ, Czech B. Channel nuclear pore complex subunits are required for transposon silencing in Drosophila. eLife 2021; 10:e66321. [PMID: 33856346 PMCID: PMC8133776 DOI: 10.7554/elife.66321] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 04/14/2021] [Indexed: 12/21/2022] Open
Abstract
The nuclear pore complex (NPC) is the principal gateway between nucleus and cytoplasm that enables exchange of macromolecular cargo. Composed of multiple copies of ~30 different nucleoporins (Nups), the NPC acts as a selective portal, interacting with factors which individually license passage of specific cargo classes. Here we show that two Nups of the inner channel, Nup54 and Nup58, are essential for transposon silencing via the PIWI-interacting RNA (piRNA) pathway in the Drosophila ovary. In ovarian follicle cells, loss of Nup54 and Nup58 results in compromised piRNA biogenesis exclusively from the flamenco locus, whereas knockdowns of other NPC subunits have widespread consequences. This provides evidence that some Nups can acquire specialised roles in tissue-specific contexts. Our findings consolidate the idea that the NPC has functions beyond simply constituting a barrier to nuclear/cytoplasmic exchange as genomic loci subjected to strong selective pressure can exploit NPC subunits to facilitate their expression.
Collapse
Affiliation(s)
- Marzia Munafò
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Victoria R Lawless
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Alessandro Passera
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Serena MacMillan
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Susanne Bornelöv
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Irmgard U Haussmann
- Department of Life Science, Faculty of Health, Education and Life Sciences, Birmingham City UniversityBirminghamUnited Kingdom
- School of Biosciences, College of Life and Environmental Sciences, University of BirminghamBirminghamUnited Kingdom
| | - Matthias Soller
- School of Biosciences, College of Life and Environmental Sciences, University of BirminghamBirminghamUnited Kingdom
- Birmingham Center for Genome Biology, University of BirminghamBirminghamUnited Kingdom
| | - Gregory J Hannon
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| | - Benjamin Czech
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing CentreCambridgeUnited Kingdom
| |
Collapse
|
69
|
Kaplinski L, Möls M, Puurand T, Pajuste FD, Remm M. KATK: Fast genotyping of rare variants directly from unmapped sequencing reads. Hum Mutat 2021; 42:777-786. [PMID: 33715282 DOI: 10.1002/humu.24197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 03/04/2021] [Accepted: 03/05/2021] [Indexed: 11/06/2022]
Abstract
KATK is a fast and accurate software tool for calling variants directly from raw next-generation sequencing reads. It uses predefined k-mers to retrieve only the reads of interest from the FASTQ file and calls genotypes by aligning retrieved reads locally. KATK does not use data about known polymorphisms and has NC (no call) as the default genotype. The reference or variant allele is called only if there is sufficient evidence for their presence in data. Thus it is not biased against rare variants or de-novo mutations. With simulated datasets, we achieved a false-negative rate of 0.23% (sensitivity 99.77%) and a false discovery rate of 0.19%. Calling all human exonic regions with KATK requires 1-2 h, depending on sequencing coverage.
Collapse
Affiliation(s)
- Lauris Kaplinski
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Märt Möls
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Tarmo Puurand
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Fanny-Dhelia Pajuste
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Maido Remm
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|
70
|
Kim YS, Johnson GD, Seo J, Barrera A, Cowart TN, Majoros WH, Ochoa A, Allen AS, Reddy TE. Correcting signal biases and detecting regulatory elements in STARR-seq data. Genome Res 2021; 31:877-889. [PMID: 33722938 PMCID: PMC8092017 DOI: 10.1101/gr.269209.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 03/09/2021] [Indexed: 12/13/2022]
Abstract
High-throughput reporter assays such as self-transcribing active regulatory region sequencing (STARR-seq) have made it possible to measure regulatory element activity across the entire human genome at once. The resulting data, however, present substantial analytical challenges. Here, we identify technical biases that explain most of the variance in STARR-seq data. We then develop a statistical model to correct those biases and to improve detection of regulatory elements. This approach substantially improves precision and recall over current methods, improves detection of both activating and repressive regulatory elements, and controls for false discoveries despite strong local correlations in signal.
Collapse
Affiliation(s)
- Young-Sook Kim
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| | - Graham D Johnson
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA
| | - Jungkyun Seo
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| | - Alejandro Barrera
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA
| | - Thomas N Cowart
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA
| | - William H Majoros
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| | - Alejandro Ochoa
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| | - Timothy E Reddy
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27710, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27710, USA.,Duke Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina 27710, USA.,Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27710, USA
| |
Collapse
|
71
|
The epigenetic pioneer EGR2 initiates DNA demethylation in differentiating monocytes at both stable and transient binding sites. Nat Commun 2021; 12:1556. [PMID: 33692344 PMCID: PMC7946903 DOI: 10.1038/s41467-021-21661-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 02/04/2021] [Indexed: 12/13/2022] Open
Abstract
The differentiation of human blood monocytes (MO), the post-mitotic precursors of macrophages (MAC) and dendritic cells (moDC), is accompanied by the active turnover of DNA methylation, but the extent, consequences and mechanisms of DNA methylation changes remain unclear. Here, we profile and compare epigenetic landscapes during IL-4/GM-CSF-driven MO differentiation across the genome and detect several thousand regions that are actively demethylated during culture, both with or without accompanying changes in chromatin accessibility or transcription factor (TF) binding. We further identify TF that are globally associated with DNA demethylation processes. While interferon regulatory factor 4 (IRF4) is found to control hallmark dendritic cell functions with less impact on DNA methylation, early growth response 2 (EGR2) proves essential for MO differentiation as well as DNA methylation turnover at its binding sites. We also show that ERG2 interacts with the 5mC hydroxylase TET2, and its consensus binding sequences show a characteristic DNA methylation footprint at demethylated sites with or without detectable protein binding. Our findings reveal an essential role for EGR2 as epigenetic pioneer in human MO and suggest that active DNA demethylation can be initiated by the TET2-recruiting TF both at stable and transient binding sites. DNA methylation turnover is an essential epigenetic process during development. Here, the authors look at the changes in DNA methylation during the differentiation of post-mitotic human monocytes (MO), and find that EGR2 interacts with TET2 and is required for DNA demethylation at its binding sites; revealing EGR2 as an epigenetic pioneer factor in human MO.
Collapse
|
72
|
Koo K, Mouradov D, Angel CM, Iseli TA, Wiesenfeld D, McCullough MJ, Burgess AW, Sieber OM. Genomic Signature of Oral Squamous Cell Carcinomas from Non-Smoking Non-Drinking Patients. Cancers (Basel) 2021; 13:cancers13051029. [PMID: 33804510 PMCID: PMC7957667 DOI: 10.3390/cancers13051029] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/15/2021] [Accepted: 02/22/2021] [Indexed: 12/14/2022] Open
Abstract
Simple Summary A clinically distinct cohort of non-smoking non-drinking patients who develop oral cavity squamous cell carcinomas has been identified, with previous work suggesting that these patients tend to be older, female, and have poor outcomes. Our study characterised tumour molecular alterations in these patients, identifying differences in genomic profiles as compared to patients who smoke and/or drink. Associations between molecular alterations and other clinical and pathological characteristics were also explored. Abstract Molecular alterations in 176 patients with oral squamous cell carcinomas (OSCC) were evaluated to delineate differences in non-smoking non-drinking (NSND) patients. Somatic mutations and DNA copy number variations (CNVs) in a 68-gene panel and human papilloma virus (HPV) status were interrogated using targeted next-generation sequencing. In the entire cohort, TP53 (60%) and CDKN2A (24%) were most frequently mutated, and the most common CNVs were EGFR amplifications (9%) and deletions of BRCA2 (5%) and CDKN2A (4%). Significant associations were found for TP53 mutation and nodal disease, lymphovascular invasion and extracapsular spread, CDKN2A mutation or deletion with advanced tumour stage, and EGFR amplification with perineural invasion and extracapsular spread. PIK3CA mutation, CDKN2A deletion, and EGFR amplification were associated with worse survival in univariate analyses (p < 0.05 for all comparisons). There were 59 NSND patients who tended to be female and older than patients who smoke and/or drink, and showed enrichment of CDKN2A mutations, EGFR amplifications, and BRCA2 deletions (p < 0.05 for all comparisons), with a younger subset showing higher mutation burden. HPV was detected in three OSCC patients and not associated with smoking and drinking habits. NSND OSCC exhibits distinct genomic profiles and further exploration to elucidate the molecular aetiology in these patients is warranted.
Collapse
Affiliation(s)
- Kendrick Koo
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medial Research, Parkville, VIC 3052, Australia; (K.K.); (D.M.); (A.W.B.)
- Department of Medical Biology, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Surgery, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3050, Australia; (T.A.I.); (D.W.)
- Melbourne Dental School, The University of Melbourne, Carlton, VIC 3053, Australia;
| | - Dmitri Mouradov
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medial Research, Parkville, VIC 3052, Australia; (K.K.); (D.M.); (A.W.B.)
- Department of Medical Biology, The University of Melbourne, Parkville, VIC 3052, Australia
| | | | - Tim A. Iseli
- Department of Surgery, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3050, Australia; (T.A.I.); (D.W.)
| | - David Wiesenfeld
- Department of Surgery, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3050, Australia; (T.A.I.); (D.W.)
- Melbourne Dental School, The University of Melbourne, Carlton, VIC 3053, Australia;
| | | | - Antony W. Burgess
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medial Research, Parkville, VIC 3052, Australia; (K.K.); (D.M.); (A.W.B.)
- Department of Medical Biology, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Surgery, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3050, Australia; (T.A.I.); (D.W.)
| | - Oliver M. Sieber
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medial Research, Parkville, VIC 3052, Australia; (K.K.); (D.M.); (A.W.B.)
- Department of Medical Biology, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Surgery, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3050, Australia; (T.A.I.); (D.W.)
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Correspondence:
| |
Collapse
|
73
|
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, Makova KD. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome. Nucleic Acids Res 2021; 49:1497-1516. [PMID: 33450015 PMCID: PMC7897504 DOI: 10.1093/nar/gkaa1269] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 12/14/2020] [Accepted: 01/11/2021] [Indexed: 12/12/2022] Open
Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
Collapse
Affiliation(s)
- Wilfried M Guiblet
- Bioinformatics and Genomics Graduate Program, Penn State University, UniversityPark, PA 16802, USA
| | - Marzia A Cremona
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Operations and Decision Systems, Université Laval, Canada
- CHU de Québec – Université Laval Research Center, Canada
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Di Chen
- Intercollege Graduate Degree Program in Genetics, Huck Institutes of the Life Sciences, Penn State University, UniversityPark, PA 16802, USA
| | - Kristin A Eckert
- Department of Pathology, Penn State University, College of Medicine, Hershey, PA 17033, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Yi-Fei Huang
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
74
|
Wang C, Wallerman O, Arendt ML, Sundström E, Karlsson Å, Nordin J, Mäkeläinen S, Pielberg GR, Hanson J, Ohlsson Å, Saellström S, Rönnberg H, Ljungvall I, Häggström J, Bergström TF, Hedhammar Å, Meadows JRS, Lindblad-Toh K. A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun Biol 2021; 4:185. [PMID: 33568770 PMCID: PMC7875987 DOI: 10.1038/s42003-021-01698-x] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/17/2020] [Indexed: 12/13/2022] Open
Abstract
We present GSD_1.0, a high-quality domestic dog reference genome with chromosome length scaffolds and contiguity increased 55-fold over CanFam3.1. Annotation with generated and existing long and short read RNA-seq, miRNA-seq and ATAC-seq, revealed that 32.1% of lifted over CanFam3.1 gaps harboured previously hidden functional elements, including promoters, genes and miRNAs in GSD_1.0. A catalogue of canine "dark" regions was made to facilitate mapping rescue. Alignment in these regions is difficult, but we demonstrate that they harbour trait-associated variation. Key genomic regions were completed, including the Dog Leucocyte Antigen (DLA), T Cell Receptor (TCR) and 366 COSMIC cancer genes. 10x linked-read sequencing of 27 dogs (19 breeds) uncovered 22.1 million SNPs, indels and larger structural variants. Subsequent intersection with protein coding genes showed that 1.4% of these could directly influence gene products, and so provide a source of normal or aberrant phenotypic modifications.
Collapse
Affiliation(s)
- Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| | - Ola Wallerman
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Maja-Louise Arendt
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Veterinary Clinical Sciences, University of Copenhagen, Frederiksberg D, Denmark
| | - Elisabeth Sundström
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Åsa Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessika Nordin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Suvi Mäkeläinen
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Gerli Rosengren Pielberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jeanette Hanson
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åsa Ohlsson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Sara Saellström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Henrik Rönnberg
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ingrid Ljungvall
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jens Häggström
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Tomas F Bergström
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Åke Hedhammar
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
75
|
Chen C, Hou J, Shi X, Yang H, Birchler JA, Cheng J. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinformatics 2021; 22:38. [PMID: 33522898 PMCID: PMC7852092 DOI: 10.1186/s12859-020-03952-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 12/29/2020] [Indexed: 12/21/2022] Open
Abstract
Background Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors. Results In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN. Conclusions DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.
Collapse
Affiliation(s)
- Chen Chen
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Xiaowen Shi
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Hua Yang
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - James A Birchler
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
76
|
Zhuo X, Du AY, Pehrsson EC, Li D, Wang T. Epigenomic differences in the human and chimpanzee genomes are associated with structural variation. Genome Res 2021; 31:279-290. [PMID: 33303495 PMCID: PMC7849402 DOI: 10.1101/gr.263491.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Accepted: 12/03/2020] [Indexed: 12/15/2022]
Abstract
Structural variation (SV), including insertions and deletions (indels), is a primary mechanism of genome evolution. However, the mechanism by which SV contributes to epigenome evolution is poorly understood. In this study, we characterized the association between lineage-specific indels and epigenome differences between human and chimpanzee to investigate how SVs might have shaped the epigenetic landscape. By intersecting medium-to-large human-chimpanzee indels (20 bp-50 kb) with putative promoters and enhancers in cranial neural crest cells (CNCCs) and repressed regions in induced pluripotent cells (iPSCs), we found that 12% of indels overlap putative regulatory and repressed regions (RRRs), and 15% of these indels are associated with lineage-biased RRRs. Indel-associated putative enhancer and repressive regions are approximately 1.3 times and approximately three times as likely to be lineage-biased, respectively, as those not associated with indels. We found a twofold enrichment of medium-sized indels (20-50 bp) in CpG island (CGI)-containing promoters than expected by chance. Lastly, from human-specific transposable element insertions, we identified putative regulatory elements, including NR2F1-bound putative CNCC enhancers derived from SVAs and putative iPSC promoters derived from LTR5s. Our results show that different types of indels are associated with specific epigenomic diversity between human and chimpanzee.
Collapse
Affiliation(s)
- Xiaoyu Zhuo
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Erica C Pehrsson
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Daofeng Li
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri 63110, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
77
|
Schwabl P, Boité MC, Bussotti G, Jacobs A, Andersson B, Moreira O, Freitas-Mesquita AL, Meyer-Fernandes JR, Telleria EL, Traub-Csekö Y, Vaselek S, Leštinová T, Volf P, Morgado FN, Porrozzi R, Llewellyn M, Späth GF, Cupolillo E. Colonization and genetic diversification processes of Leishmania infantum in the Americas. Commun Biol 2021; 4:139. [PMID: 33514858 PMCID: PMC7846609 DOI: 10.1038/s42003-021-01658-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 01/04/2021] [Indexed: 12/30/2022] Open
Abstract
Leishmania infantum causes visceral leishmaniasis, a deadly vector-borne disease introduced to the Americas during the colonial era. This non-native trypanosomatid parasite has since established widespread transmission cycles using alternative vectors, and human infection has become a significant concern to public health, especially in Brazil. A multi-kilobase deletion was recently detected in Brazilian L. infantum genomes and is suggested to reduce susceptibility to the anti-leishmanial drug miltefosine. We show that deletion-carrying strains occur in at least 15 Brazilian states and describe diversity patterns suggesting that these derive from common ancestral mutants rather than from recurrent independent mutation events. We also show that the deleted locus and associated enzymatic activity is restored by hybridization with non-deletion type strains. Genetic exchange appears common in areas of secondary contact but also among closely related parasites. We examine demographic and ecological scenarios underlying this complex L. infantum population structure and discuss implications for disease control. Philipp Schwabl, Mariana Boité, and colleagues analyze 126 Leishmania infantum genomes to determine how demographic and selective consequences of the parasite’s invasive history have contributed to intricate population genetic heterogeneity across Brazil. Their data suggest a complex interplay of population expansion, secondary contact and genetic exchange events underlying diversity patterns at short and long-distance scales. These processes also appear pivotal to the proliferation of a drug resistance-associated multi-gene deletion on chromosome 31.
Collapse
Affiliation(s)
- Philipp Schwabl
- School of Life Sciences, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, G12 8QQ, Glasgow, UK
| | - Mariana C Boité
- Laboratório de Pesquisa em Leishmaniose, Instituto Oswaldo Cruz, FIOCRUZ, 21040-365, Rio de Janeiro, Brazil.
| | - Giovanni Bussotti
- Institut Pasteur-Bioinformatics and Biostatistics Hub-C3BI, USR 3756 IP CNRS, 75015, Paris, France.,Department of Parasites and Insect Vectors, Institut Pasteur, INSERM U1201, Unité de Parasitology moléculaire et Signalisation, 75015, Paris, France
| | - Arne Jacobs
- School of Life Sciences, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, G12 8QQ, Glasgow, UK
| | - Bjorn Andersson
- Department of Cell and Molecular Biology, Science for Life Laboratory, Karolinska Institutet, Biomedicum 9C, 171 77, Stockholm, Sweden
| | - Otacilio Moreira
- Laboratório de Biologia Molecular e Doenças Endêmicas, Instituto Oswaldo Cruz, Fiocruz, 21040-365, Rio de Janeiro, RJ, Brazil
| | - Anita L Freitas-Mesquita
- Instituto de Bioquímica Médica Leopoldo de Meis (IBqM), Universidade Federal do Rio de Janeiro (UFRJ), 21941-590, Rio de Janeiro, RJ, Brazil
| | - Jose Roberto Meyer-Fernandes
- Instituto de Bioquímica Médica Leopoldo de Meis (IBqM), Universidade Federal do Rio de Janeiro (UFRJ), 21941-590, Rio de Janeiro, RJ, Brazil
| | - Erich L Telleria
- Laboratório de Biologia Molecular de Parasitas e Vetores, Instituto Oswaldo Cruz, 21040-365, Rio de Janeiro, Brazil.,Faculty of Science, Department of Parasitology, Charles University, 128 44, Prague, Czech Republic
| | - Yara Traub-Csekö
- Laboratório de Biologia Molecular de Parasitas e Vetores, Instituto Oswaldo Cruz, 21040-365, Rio de Janeiro, Brazil
| | - Slavica Vaselek
- Faculty of Science, Department of Parasitology, Charles University, 128 44, Prague, Czech Republic
| | - Tereza Leštinová
- Faculty of Science, Department of Parasitology, Charles University, 128 44, Prague, Czech Republic
| | - Petr Volf
- Faculty of Science, Department of Parasitology, Charles University, 128 44, Prague, Czech Republic
| | - Fernanda N Morgado
- Laboratório de Pesquisa em Leishmaniose, Instituto Oswaldo Cruz, FIOCRUZ, 21040-365, Rio de Janeiro, Brazil
| | - Renato Porrozzi
- Laboratório de Pesquisa em Leishmaniose, Instituto Oswaldo Cruz, FIOCRUZ, 21040-365, Rio de Janeiro, Brazil
| | - Martin Llewellyn
- School of Life Sciences, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, G12 8QQ, Glasgow, UK
| | - Gerald F Späth
- Department of Parasites and Insect Vectors, Institut Pasteur, INSERM U1201, Unité de Parasitology moléculaire et Signalisation, 75015, Paris, France
| | - Elisa Cupolillo
- Laboratório de Pesquisa em Leishmaniose, Instituto Oswaldo Cruz, FIOCRUZ, 21040-365, Rio de Janeiro, Brazil
| |
Collapse
|
78
|
Gignoux-Wolfsohn SA, Pinsky ML, Kerwin K, Herzog C, Hall M, Bennett AB, Fefferman NH, Maslo B. Genomic signatures of selection in bats surviving white-nose syndrome. Mol Ecol 2021; 30:5643-5657. [PMID: 33476441 DOI: 10.1111/mec.15813] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 01/13/2021] [Accepted: 01/14/2021] [Indexed: 02/06/2023]
Abstract
Rapid evolution of advantageous traits following abrupt environmental change can help populations recover from demographic decline. However, for many introduced diseases affecting longer-lived, slower reproducing hosts, mortality is likely to outpace the acquisition of adaptive de novo mutations. Adaptive alleles must therefore be selected from standing genetic variation, a process that leaves few detectable genomic signatures. Here, we present whole genome evidence for selection in bat populations that are recovering from white-nose syndrome (WNS). We collected samples both during and after a WNS-induced mass mortality event in two little brown bat populations that are beginning to show signs of recovery and found signatures of soft sweeps from standing genetic variation at multiple loci throughout the genome. We identified one locus putatively under selection in a gene associated with the immune system. Multiple loci putatively under selection were located within genes previously linked to host response to WNS as well as to changes in metabolism during hibernation. Results from two additional populations suggested that loci under selection may differ somewhat among populations. Through these findings, we suggest that WNS-induced selection may contribute to genetic resistance in this slowly reproducing species threatened with extinction.
Collapse
Affiliation(s)
- Sarah A Gignoux-Wolfsohn
- Department of Ecology, Evolution, and Natural Resources, Rutgers The State University of New Jersey, New Brunswick, NJ, USA
| | - Malin L Pinsky
- Department of Ecology, Evolution, and Natural Resources, Rutgers The State University of New Jersey, New Brunswick, NJ, USA
| | - Kathleen Kerwin
- Department of Ecology, Evolution, and Natural Resources, Rutgers The State University of New Jersey, New Brunswick, NJ, USA
| | - Carl Herzog
- New York State Department of Environmental Conservation, Albany, NY, USA
| | - MacKenzie Hall
- Endangered and Nongame Species Program, New Jersey Department of Environmental Protection, Trenton, NJ, USA
| | | | - Nina H Fefferman
- Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN, USA.,National Institute for Mathematical and Biological Synthesis, University of Tennessee, Tennessee, TN, USA
| | - Brooke Maslo
- Department of Ecology, Evolution, and Natural Resources, Rutgers The State University of New Jersey, New Brunswick, NJ, USA
| |
Collapse
|
79
|
Abstract
Application of next generation sequencing techniques in the field of liquid biopsy, in particular urine, requires specific bioinformatics methods in order to deal with its peculiarity. Many aspects of cancer can be explored starting from nucleic acids, especially from cell-free DNA and circulating tumor DNA in order to characterize cancer. It is possible to detect small mutations, as single nucleotide variants, small insertions and deletions, copy-number alterations, and epigenetic profiles. Due to the low fraction of circulating tumor DNA over the whole cell-free DNA, some methods have been exploited. One of them is the application of unique barcodes to each DNA fragment in order to lower the limit of detection of cancer-related variants. Some bioinformatics workflows and tools are the same of a classic analysis of tumor tissue, but there are some steps in which specific algorithms have to be introduced.
Collapse
|
80
|
Abstract
Pseudogenes are commonly labeled as "junk DNA" given their perceived nonfunctional status. However, the advent of large-scale genomics projects prompted a revisit of pseudogene biology, highlighting their key functional and regulatory roles in numerous diseases, including cancers. Integrative analyses of cancer data have shown that pseudogenes can be transcribed and even translated, and that pseudogenic DNA, RNA, and proteins can interfere with the activity and function of key protein coding genes, acting as regulators of oncogenes and tumor suppressors. Capitalizing on the available clinical research, we are able to get an insight into the spread and variety of pseudogene biomarker and therapeutic potential. In this chapter, we describe pseudogenes that fulfill their role as diagnostic or prognostic biomarkers, both as unique elements and in collaboration with other genes or pseudogenes. We also report that the majority of prognostic pseudogenes are overexpressed and exert an oncogenic role in colorectal, liver, lung, and gastric cancers. Finally, we highlight a number of pseudogenes that can establish future therapeutic avenues.
Collapse
|
81
|
Arega Y, Jiang H, Wang S, Zhang J, Niu X, Li G. ChIAMM: A Mixture Model for Statistical Analysis of Long-Range Chromatin Interactions From ChIA-PET Experiments. Front Genet 2021; 11:616160. [PMID: 33381154 PMCID: PMC7767989 DOI: 10.3389/fgene.2020.616160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/11/2020] [Indexed: 11/13/2022] Open
Abstract
Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) is an important experimental method for detecting specific protein-mediated chromatin loops genome-wide at high resolution. Here, we proposed a new statistical approach with a mixture model, chromatin interaction analysis using mixture model (ChIAMM), to detect significant chromatin interactions from ChIA-PET data. The statistical model is cast into a Bayesian framework to consider more systematic biases: the genomic distance, local enrichment, mappability, and GC content. Using different ChIA-PET datasets, we evaluated the performance of ChIAMM and compared it with the existing methods, including ChIA-PET Tool, ChiaSig, Mango, ChIA-PET2, and ChIAPoP. The result showed that the new approach performed better than most top existing methods in detecting significant chromatin interactions in ChIA-PET experiments.
Collapse
Affiliation(s)
- Yibeltal Arega
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Hao Jiang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Shuangqi Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Jingwen Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Xiaohui Niu
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China.,National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
82
|
Pockrandt C, Alzamel M, Iliopoulos CS, Reinert K. GenMap: ultra-fast computation of genome mappability. Bioinformatics 2020; 36:3687-3692. [PMID: 32246826 PMCID: PMC7320602 DOI: 10.1093/bioinformatics/btaa222] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 03/23/2020] [Accepted: 03/31/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. RESULTS We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. AVAILABILITY AND IMPLEMENTATION GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.
Collapse
Affiliation(s)
- Christopher Pockrandt
- Center for Computational Biology, School of Medicine.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.,Department of Computer Science and Mathematics, Freie Universität Berlin.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Mai Alzamel
- Department of Informatics, King's College London, London, UK.,Department of Computer Science, King Saud University, Riyadh, Saudi Arabia
| | | | - Knut Reinert
- Department of Computer Science and Mathematics, Freie Universität Berlin.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
83
|
Lee D, Shi M, Moran J, Wall M, Zhang J, Liu J, Fitzgerald D, Kyono Y, Ma L, White KP, Gerstein M. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol 2020; 21:298. [PMID: 33292397 PMCID: PMC7722316 DOI: 10.1186/s13059-020-02194-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 11/04/2020] [Indexed: 12/11/2022] Open
Abstract
STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.
Collapse
Affiliation(s)
- Donghoon Lee
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Manman Shi
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Jennifer Moran
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Martha Wall
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Jing Zhang
- School of Information and Computer Sciences, University of California, Irvine, CA, 92697, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Dominic Fitzgerald
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA
| | - Yasuhiro Kyono
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Lijia Ma
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,School of Life Sciences, Westlake University, Hangzhou, 310024, Zhejiang, China
| | - Kevin P White
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA. .,Tempus Labs, Inc., Chicago, IL, 60654, USA.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA. .,Department of Computer Science, Yale University, New Haven, CT, 06520, USA. .,Department of Statistics and Data Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
84
|
Schwabl P, Maiguashca Sánchez J, Costales JA, Ocaña-Mayorga S, Segovia M, Carrasco HJ, Hernández C, Ramírez JD, Lewis MD, Grijalva MJ, Llewellyn MS. Culture-free genome-wide locus sequence typing (GLST) provides new perspectives on Trypanosoma cruzi dispersal and infection complexity. PLoS Genet 2020; 16:e1009170. [PMID: 33326438 PMCID: PMC7743988 DOI: 10.1371/journal.pgen.1009170] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/02/2020] [Indexed: 12/30/2022] Open
Abstract
Analysis of genetic polymorphism is a powerful tool for epidemiological surveillance and research. Powerful inference from pathogen genetic variation, however, is often restrained by limited access to representative target DNA, especially in the study of obligate parasitic species for which ex vivo culture is resource-intensive or bias-prone. Modern sequence capture methods enable pathogen genetic variation to be analyzed directly from host/vector material but are often too complex and expensive for resource-poor settings where infectious diseases prevail. This study proposes a simple, cost-effective 'genome-wide locus sequence typing' (GLST) tool based on massive parallel amplification of information hotspots throughout the target pathogen genome. The multiplexed polymerase chain reaction amplifies hundreds of different, user-defined genetic targets in a single reaction tube, and subsequent agarose gel-based clean-up and barcoding completes library preparation at under 4 USD per sample. Our study generates a flexible GLST primer panel design workflow for Trypanosoma cruzi, the parasitic agent of Chagas disease. We successfully apply our 203-target GLST panel to direct, culture-free metagenomic extracts from triatomine vectors containing a minimum of 3.69 pg/μl T. cruzi DNA and further elaborate on method performance by sequencing GLST libraries from T. cruzi reference clones representing discrete typing units (DTUs) TcI, TcIII, TcIV, TcV and TcVI. The 780 SNP sites we identify in the sample set repeatably distinguish parasites infecting sympatric vectors and detect correlations between genetic and geographic distances at regional (< 150 km) as well as continental scales. The markers also clearly separate TcI, TcIII, TcIV and TcV + TcVI and appear to distinguish multiclonal infections within TcI. We discuss the advantages, limitations and prospects of our method across a spectrum of epidemiological research.
Collapse
Affiliation(s)
- Philipp Schwabl
- Institute of Biodiversity, Animal Health & Comparative Medicine, University of Glasgow, Glasgow, United Kingdom
| | - Jalil Maiguashca Sánchez
- Centro de Investigación para la Salud en América Latina, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
| | - Jaime A. Costales
- Centro de Investigación para la Salud en América Latina, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
| | - Sofía Ocaña-Mayorga
- Centro de Investigación para la Salud en América Latina, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
| | - Maikell Segovia
- Laboratorio de Biología Molecular de Protozoarios, Instituto de Medicina Tropical, Universidad Central de Venezuela, Caracas, Venezuela
| | - Hernán J. Carrasco
- Laboratorio de Biología Molecular de Protozoarios, Instituto de Medicina Tropical, Universidad Central de Venezuela, Caracas, Venezuela
| | - Carolina Hernández
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Juan David Ramírez
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Michael D. Lewis
- London School of Hygiene & Tropical Medicine, Keppel Street, London, United Kingdom
| | - Mario J. Grijalva
- Centro de Investigación para la Salud en América Latina, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
- Infectious and Tropical Disease Institute, Biomedical Sciences Department, Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, United States of America
| | - Martin S. Llewellyn
- Institute of Biodiversity, Animal Health & Comparative Medicine, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
85
|
Kautt AF, Kratochwil CF, Nater A, Machado-Schiaffino G, Olave M, Henning F, Torres-Dowdall J, Härer A, Hulsey CD, Franchini P, Pippel M, Myers EW, Meyer A. Contrasting signatures of genomic divergence during sympatric speciation. Nature 2020; 588:106-111. [PMID: 33116308 PMCID: PMC7759464 DOI: 10.1038/s41586-020-2845-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 07/23/2020] [Indexed: 01/25/2023]
Abstract
The transition from 'well-marked varieties' of a single species into 'well-defined species'-especially in the absence of geographic barriers to gene flow (sympatric speciation)-has puzzled evolutionary biologists ever since Darwin1,2. Gene flow counteracts the buildup of genome-wide differentiation, which is a hallmark of speciation and increases the likelihood of the evolution of irreversible reproductive barriers (incompatibilities) that complete the speciation process3. Theory predicts that the genetic architecture of divergently selected traits can influence whether sympatric speciation occurs4, but empirical tests of this theory are scant because comprehensive data are difficult to collect and synthesize across species, owing to their unique biologies and evolutionary histories5. Here, within a young species complex of neotropical cichlid fishes (Amphilophus spp.), we analysed genomic divergence among populations and species. By generating a new genome assembly and re-sequencing 453 genomes, we uncovered the genetic architecture of traits that have been suggested to be important for divergence. Species that differ in monogenic or oligogenic traits that affect ecological performance and/or mate choice show remarkably localized genomic differentiation. By contrast, differentiation among species that have diverged in polygenic traits is genomically widespread and much higher overall, consistent with the evolution of effective and stable genome-wide barriers to gene flow. Thus, we conclude that simple trait architectures are not always as conducive to speciation with gene flow as previously suggested, whereas polygenic architectures can promote rapid and stable speciation in sympatry.
Collapse
Affiliation(s)
- Andreas F Kautt
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - Alexander Nater
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Gonzalo Machado-Schiaffino
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Functional Biology, Area of Genetics, University of Oviedo, Oviedo, Spain
| | - Melisa Olave
- Department of Biology, University of Konstanz, Konstanz, Germany
- Argentine Dryland Research Institute of the National Council for Scientific Research (IADIZA-CONICET), Mendoza, Argentina
| | - Frederico Henning
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Genetics, Institute of Biology, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | | | - Andreas Härer
- Department of Biology, University of Konstanz, Konstanz, Germany
- Division of Biological Sciences, Section of Ecology, Behavior & Evolution, University of California San Diego, La Jolla, CA, USA
| | - C Darrin Hulsey
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology Dresden, Dresden, Germany
| | - Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany.
| |
Collapse
|
86
|
Fu L, Zhang L, Dollinger E, Peng Q, Nie Q, Xie X. Predicting transcription factor binding in single cells through deep learning. SCIENCE ADVANCES 2020; 6:eaba9031. [PMID: 33355120 PMCID: PMC11206197 DOI: 10.1126/sciadv.aba9031] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 10/29/2020] [Indexed: 06/12/2023]
Abstract
Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric "TF activity score" to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.
Collapse
Affiliation(s)
- Laiyi Fu
- Systems Engineering Institute, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shannxi 710049, China
- Department of Computer Science, University of California, Irvine, Irvine, CA 92697, USA
| | - Lihua Zhang
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
| | - Emmanuel Dollinger
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Qinke Peng
- Systems Engineering Institute, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shannxi 710049, China
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697, USA.
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Xiaohui Xie
- Department of Computer Science, University of California, Irvine, Irvine, CA 92697, USA.
- NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| |
Collapse
|
87
|
Durrant C, Thiele EA, Holroyd N, Doyle SR, Sallé G, Tracey A, Sankaranarayanan G, Lotkowska ME, Bennett HM, Huckvale T, Abdellah Z, Tchindebet O, Wossen M, Logora MSY, Coulibaly CO, Weiss A, Schulte-Hostedde AI, Foster JM, Cleveland CA, Yabsley MJ, Ruiz-Tiben E, Berriman M, Eberhard ML, Cotton JA. Population genomic evidence that human and animal infections in Africa come from the same populations of Dracunculus medinensis. PLoS Negl Trop Dis 2020; 14:e0008623. [PMID: 33253172 PMCID: PMC7728184 DOI: 10.1371/journal.pntd.0008623] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 12/10/2020] [Accepted: 07/22/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Guinea worm-Dracunculus medinensis-was historically one of the major parasites of humans and has been known since antiquity. Now, Guinea worm is on the brink of eradication, as efforts to interrupt transmission have reduced the annual burden of disease from millions of infections per year in the 1980s to only 54 human cases reported globally in 2019. Despite the enormous success of eradication efforts to date, one complication has arisen. Over the last few years, hundreds of dogs have been found infected with this previously apparently anthroponotic parasite, almost all in Chad. Moreover, the relative numbers of infections in humans and dogs suggests that dogs are currently the principal reservoir on infection and key to maintaining transmission in that country. PRINCIPAL FINDINGS In an effort to shed light on this peculiar epidemiology of Guinea worm in Chad, we have sequenced and compared the genomes of worms from dog, human and other animal infections. Confirming previous work with other molecular markers, we show that all of these worms are D. medinensis, and that the same population of worms are causing both infections, can confirm the suspected transmission between host species and detect signs of a population bottleneck due to the eradication efforts. The diversity of worms in Chad appears to exclude the possibility that there were no, or very few, worms present in the country during a 10-year absence of reported cases. CONCLUSIONS This work reinforces the importance of adequate surveillance of both human and dog populations in the Guinea worm eradication campaign and suggests that control programs aiming to interrupt disease transmission should stay aware of the possible emergence of unusual epidemiology as pathogens approach elimination.
Collapse
Affiliation(s)
- Caroline Durrant
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Elizabeth A. Thiele
- Department of Biology, Vassar College, Poughkeepsie, New York, United States of America
| | - Nancy Holroyd
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Stephen R. Doyle
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Guillaume Sallé
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- INRA—U. Tours, UMR 1282 ISP Infectiologie et Santé Publique, Nouzilly, France
| | - Alan Tracey
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Geetha Sankaranarayanan
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Magda E. Lotkowska
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Hayley M. Bennett
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Present Address: Berkeley Lights Inc., Emeryville, California, United States of America
| | - Thomas Huckvale
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Zahra Abdellah
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Ouakou Tchindebet
- Guinea Worm Eradication Program, The Carter Center, Atlanta, Georgia, United States of America
| | - Mesfin Wossen
- Guinea Worm Eradication Program, The Carter Center, Atlanta, Georgia, United States of America
| | | | - Cheick Oumar Coulibaly
- Guinea Worm Eradication Program, The Carter Center, Atlanta, Georgia, United States of America
| | - Adam Weiss
- Guinea Worm Eradication Program, The Carter Center, Atlanta, Georgia, United States of America
| | | | - Jeremy M. Foster
- New England Biolabs, Ipswich, Massachusetts, United States of America
| | - Christopher A. Cleveland
- Southeastern Cooperative Wildlife Disease Study, Department of Population Health, Veterinary Medicine, University of Georgia, Athens, Georgia, United States of America
| | - Michael J. Yabsley
- Southeastern Cooperative Wildlife Disease Study, Department of Population Health, Veterinary Medicine, University of Georgia, Athens, Georgia, United States of America
- Warnell School of Forestry and Natural Resources, University of Georgia, Athens, Georgia, United States of America
| | - Ernesto Ruiz-Tiben
- Guinea Worm Eradication Program, The Carter Center, Atlanta, Georgia, United States of America
| | - Matthew Berriman
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (JAC); (MB)
| | - Mark L. Eberhard
- Retired, Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - James A. Cotton
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (JAC); (MB)
| |
Collapse
|
88
|
Yurchenko AA, Padioleau I, Matkarimov BT, Soulier J, Sarasin A, Nikolaev S. XPC deficiency increases risk of hematologic malignancies through mutator phenotype and characteristic mutational signature. Nat Commun 2020; 11:5834. [PMID: 33203900 PMCID: PMC7672101 DOI: 10.1038/s41467-020-19633-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 10/07/2020] [Indexed: 12/21/2022] Open
Abstract
Recent studies demonstrated a dramatically increased risk of leukemia in patients with a rare genetic disorder, Xeroderma Pigmentosum group C (XP-C), characterized by constitutive deficiency of global genome nucleotide excision repair (GG-NER). The genetic mechanisms of non-skin cancers in XP-C patients remain unexplored. In this study, we analyze a unique collection of internal XP-C tumor genomes including 6 leukemias and 2 sarcomas. We observe a specific mutational pattern and an average of 25-fold increase of mutation rates in XP-C versus sporadic leukemia which we presume leads to its elevated incidence and early appearance. We describe a strong mutational asymmetry with respect to transcription and the direction of replication in XP-C tumors suggesting association of mutagenesis with bulky purine DNA lesions of probably endogenous origin. These findings suggest existence of a balance between formation and repair of bulky DNA lesions by GG-NER in human body cells which is disrupted in XP-C patients. Xeroderma Pigmentosum group C (XP-C) is a rare genetic disorder characterised by deficient DNA repair leading to skin and internal cancer, but the latter is not well understood molecularly. Here the authors sequence genomes of non-skin cancers from XP-C patients to unravel its mutational patterns.
Collapse
Affiliation(s)
- Andrey A Yurchenko
- INSERM U981, Gustave Roussy Cancer Campus, Université Paris Saclay, Villejuif, France
| | - Ismael Padioleau
- INSERM U981, Gustave Roussy Cancer Campus, Université Paris Saclay, Villejuif, France
| | - Bakhyt T Matkarimov
- National Laboratory Astana, Nazarbayev University, 010000, Astana, Kazakhstan
| | - Jean Soulier
- University of Paris, INSERM U944 and CNRS UMR7212, Institut de Recherche Saint-Louis, F-75010, Paris, France
| | - Alain Sarasin
- CNRS UMR9019 Genome Integrity and Cancers, Institut Gustave Roussy, Université Paris-Saclay, Villejuif, France
| | - Sergey Nikolaev
- INSERM U981, Gustave Roussy Cancer Campus, Université Paris Saclay, Villejuif, France.
| |
Collapse
|
89
|
A High Quality Asian Genome Assembly Identifies Features of Common Missing Regions. Genes (Basel) 2020; 11:genes11111350. [PMID: 33202901 PMCID: PMC7697454 DOI: 10.3390/genes11111350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 11/26/2022] Open
Abstract
The current human reference genome (GRCh38), with its superior quality, has contributed significantly to genome analysis. However, GRCh38 may still underrepresent the ethnic genome, specifically for Asians, though exactly what we are missing is still elusive. Here, we juxtaposed GRCh38 with a high-contiguity genome assembly of one Korean (AK1) to show that a part of AK1 genome is missing in GRCh38 and that the missing regions harbored ~1390 putative coding elements. Furthermore, we found that multiple populations shared some certain parts in the missing genome when we analyzed the “unmapped” (to GRCh38) reads of fourteen individuals (five East-Asians, four Europeans, and five Africans), amounting to ~5.3 Mb (~0.2% of AK1) of the total genomic regions. The recovered AK1 regions from the “unmapped reads”, which were the estimated missing regions that did not exist in GRCh38, harbored candidate coding elements. We verified that most of the common (shared by ≥7 individuals) missing regions exist in human and chimpanzee DNA. Moreover, we further identified the occurrence mechanism and ethnic heterogeneity as well as the presence of the common missing regions. This study illuminates a potential advantage of using a pangenome reference and brings up the need for further investigations on the various features of regions globally missed in GRCh38.
Collapse
|
90
|
Cheng X, DeGiorgio M. Flexible Mixture Model Approaches That Accommodate Footprint Size Variability for Robust Detection of Balancing Selection. Mol Biol Evol 2020; 37:3267-3291. [PMID: 32462188 PMCID: PMC7820363 DOI: 10.1093/molbev/msaa134] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169-SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.
Collapse
Affiliation(s)
- Xiaoheng Cheng
- Huck Institutes of Life Sciences, Pennsylvania State University, University Park, PA
- Department of Biology, Pennsylvania State University, University Park, PA
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| |
Collapse
|
91
|
Ghirotto S, Vizzari MT, Tassi F, Barbujani G, Benazzo A. Distinguishing among complex evolutionary models using unphased whole-genome data through random forest approximate Bayesian computation. Mol Ecol Resour 2020; 21:2614-2628. [PMID: 33000507 DOI: 10.1111/1755-0998.13263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 08/28/2020] [Accepted: 09/07/2020] [Indexed: 01/25/2023]
Abstract
Inferring past demographic histories is crucial in population genetics, and the amount of complete genomes now available should in principle facilitate this inference. In practice, however, the available inferential methods suffer from severe limitations. Although hundreds complete genomes can be simultaneously analysed, complex demographic processes can easily exceed computational constraints, and the procedures to evaluate the reliability of the estimates contribute to increase the computational effort. Here we present an approximate Bayesian computation framework based on the random forest algorithm (ABC-RF), to infer complex past population processes using complete genomes. To this aim, we propose to summarize the data by the full genomic distribution of the four mutually exclusive categories of segregating sites (FDSS), a statistic fast to compute from unphased genome data and that does not require the ancestral state of alleles to be known. We constructed an efficient ABC pipeline and tested how accurately it allows one to recognize the true model among models of increasing complexity, using simulated data and taking into account different sampling strategies in terms of number of individuals analysed, number and size of the genetic loci considered. We also compared the FDSS with the unfolded and folded site frequency spectrum (SFS), and for these statistics we highlighted the experimental conditions maximizing the inferential power of the ABC-RF procedure. We finally analysed real data sets, testing models on the dispersal of anatomically modern humans out of Africa and exploring the evolutionary relationships of the three species of Orangutan inhabiting Borneo and Sumatra.
Collapse
Affiliation(s)
- Silvia Ghirotto
- Department of Mathematics and Computer Science, University of Ferrara, Ferrara, Italy
| | - Maria Teresa Vizzari
- Department of Mathematics and Computer Science, University of Ferrara, Ferrara, Italy
| | - Francesca Tassi
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Guido Barbujani
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Andrea Benazzo
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| |
Collapse
|
92
|
Anzawa H, Yamagata H, Kinoshita K. Theoretical characterisation of strand cross-correlation in ChIP-seq. BMC Bioinformatics 2020; 21:417. [PMID: 32962634 PMCID: PMC7510163 DOI: 10.1186/s12859-020-03729-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 08/31/2020] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Strand cross-correlation profiles are used for both peak calling pre-analysis and quality control (QC) in chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis. Despite its potential for robust and accurate assessments of signal-to-noise ratio (S/N) because of its peak calling independence, it remains unclear what aspects of quality such strand cross-correlation profiles actually measure. RESULTS We introduced a simple model to simulate the mapped read-density of ChIP-seq and then derived the theoretical maximum and minimum of cross-correlation coefficients between strands. The results suggest that the maximum coefficient of typical ChIP-seq samples is directly proportional to the number of total mapped reads and the square of the ratio of signal reads, and inversely proportional to the number of peaks and the length of read-enriched regions. Simulation analysis supported our results and evaluation using 790 ChIP-seq data obtained from the public database demonstrated high consistency between calculated cross-correlation coefficients and estimated coefficients based on the theoretical relations and peak calling results. In addition, we found that the mappability-bias-correction improved sensitivity, enabling differentiation of maximum coefficients from the noise level. Based on these insights, we proposed virtual S/N (VSN), a novel peak call-free metric for S/N assessment. We also developed PyMaSC, a tool to calculate strand cross-correlation and VSN efficiently. VSN achieved most consistent S/N estimation for various ChIP targets and sequencing read depths. Furthermore, we demonstrated that a combination of VSN and pre-existing peak calling results enable the estimation of the numbers of detectable peaks for posterior experiments and assess peak calling results. CONCLUSIONS We present the first theoretical insights into the strand cross-correlation, and the results reveal the potential and the limitations of strand cross-correlation analysis. Our quality assessment framework using VSN provides peak call-independent QC and will help in the evaluation of peak call analysis in ChIP-seq experiments.
Collapse
Affiliation(s)
- Hayato Anzawa
- Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan
| | - Hitoshi Yamagata
- Advanced Research Laboratory, Canon Medical Systems Corporation, Otawara, Tochigi, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan. .,Advanced Research Laboratory, Canon Medical Systems Corporation, Otawara, Tochigi, Japan. .,Tohoku Medical Megabank Organization, Sendai, Miyagi, Japan. .,Institute of Development, Aging and Cancer, Tohoku University, Sendai, Miyagi, Japan.
| |
Collapse
|
93
|
Zhuang X, Ye R, So MT, Lam WY, Karim A, Yu M, Ngo ND, Cherny SS, Tam PKH, Garcia-Barcelo MM, Tang CSM, Sham PC. A random forest-based framework for genotyping and accuracy assessment of copy number variations. NAR Genom Bioinform 2020; 2:lqaa071. [PMID: 33575619 PMCID: PMC7671382 DOI: 10.1093/nargab/lqaa071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 08/18/2020] [Accepted: 08/26/2020] [Indexed: 12/24/2022] Open
Abstract
Detection of copy number variations (CNVs) is essential for uncovering genetic factors underlying human diseases. However, CNV detection by current methods is prone to error, and precisely identifying CNVs from paired-end whole genome sequencing (WGS) data is still challenging. Here, we present a framework, CNV-JACG, for Judging the Accuracy of CNVs and Genotyping using paired-end WGS data. CNV-JACG is based on a random forest model trained on 21 distinctive features characterizing the CNV region and its breakpoints. Using the data from the 1000 Genomes Project, Genome in a Bottle Consortium, the Human Genome Structural Variation Consortium and in-house technical replicates, we show that CNV-JACG has superior sensitivity over the latest genotyping method, SV2, particularly for the small CNVs (≤1 kb). We also demonstrate that CNV-JACG outperforms SV2 in terms of Mendelian inconsistency in trios and concordance between technical replicates. Our study suggests that CNV-JACG would be a useful tool in assessing the accuracy of CNVs to meet the ever-growing needs for uncovering the missing heritability linked to CNVs.
Collapse
Affiliation(s)
- Xuehan Zhuang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Rui Ye
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Man-Ting So
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Wai-Yee Lam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Anwarul Karim
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Michelle Yu
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Ngoc Diem Ngo
- National Hospital of Pediatrics, Ha Noi 100000, Vietnam
| | - Stacey S Cherny
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Paul Kwong-Hang Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | | - Clara Sze-Man Tang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
94
|
Abstract
Chromatin Conformation Capture techniques have unveiled several layers of chromosome organization such as the segregation in compartments, the folding in topologically associating domains (TADs), and site-specific looping interactions. The discovery of this genome hierarchical organization emerged from the computational analysis of chromatin capture data. With the increasing availability of such data, automatic pipelines for the robust comparison, grouping, and classification of multiple experiments are needed. Here we present a pipeline based on the TADbit framework that emphasizes reproducibility, automation, quality check, and statistical robustness. This comprehensive modular pipeline covers all the steps from the sequencing products to the visualization of reconstructed 3D models of the chromatin.
Collapse
|
95
|
Mughal MR, Koch H, Huang J, Chiaromonte F, DeGiorgio M. Learning the properties of adaptive regions with functional data analysis. PLoS Genet 2020; 16:e1008896. [PMID: 32853200 PMCID: PMC7480868 DOI: 10.1371/journal.pgen.1008896] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 09/09/2020] [Accepted: 05/29/2020] [Indexed: 12/12/2022] Open
Abstract
Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.
Collapse
Affiliation(s)
- Mehreen R. Mughal
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Hillary Koch
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Jinguo Huang
- Bioinformatics and Genomics at the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
| |
Collapse
|
96
|
Richter F, Morton SU, Kim SW, Kitaygorodsky A, Wasson LK, Chen KM, Zhou J, Qi H, Patel N, DePalma SR, Parfenov M, Homsy J, Gorham JM, Manheimer KB, Velinder M, Farrell A, Marth G, Schadt EE, Kaltman JR, Newburger JW, Giardini A, Goldmuntz E, Brueckner M, Kim R, Porter GA, Bernstein D, Chung WK, Srivastava D, Tristani-Firouzi M, Troyanskaya OG, Dickel DE, Shen Y, Seidman JG, Seidman CE, Gelb BD. Genomic analyses implicate noncoding de novo variants in congenital heart disease. Nat Genet 2020; 52:769-777. [PMID: 32601476 PMCID: PMC7415662 DOI: 10.1038/s41588-020-0652-z] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 05/22/2020] [Indexed: 02/07/2023]
Abstract
A genetic etiology is identified for one-third of patients with congenital heart disease (CHD), with 8% of cases attributable to coding de novo variants (DNVs). To assess the contribution of noncoding DNVs to CHD, we compared genome sequences from 749 CHD probands and their parents with those from 1,611 unaffected trios. Neural network prediction of noncoding DNV transcriptional impact identified a burden of DNVs in individuals with CHD (n = 2,238 DNVs) compared to controls (n = 4,177; P = 8.7 × 10-4). Independent analyses of enhancers showed an excess of DNVs in associated genes (27 genes versus 3.7 expected, P = 1 × 10-5). We observed significant overlap between these transcription-based approaches (odds ratio (OR) = 2.5, 95% confidence interval (CI) 1.1-5.0, P = 5.4 × 10-3). CHD DNVs altered transcription levels in 5 of 31 enhancers assayed. Finally, we observed a DNV burden in RNA-binding-protein regulatory sites (OR = 1.13, 95% CI 1.1-1.2, P = 8.8 × 10-5). Our findings demonstrate an enrichment of potentially disruptive regulatory noncoding DNVs in a fraction of CHD at least as high as that observed for damaging coding DNVs.
Collapse
Affiliation(s)
- Felix Richter
- Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sarah U Morton
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Seong Won Kim
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Alexander Kitaygorodsky
- Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Lauren K Wasson
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - Jian Zhou
- Flatiron Institute, Simons Foundation, New York, NY, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Hongjian Qi
- Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Nihir Patel
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | - Jason Homsy
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Center for External Innovation, Takeda Pharmaceuticals USA, Cambridge, MA, USA
| | - Joshua M Gorham
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Kathryn B Manheimer
- Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
| | - Matthew Velinder
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Andrew Farrell
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Gabor Marth
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Sema4, Stamford, CT, USA
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jonathan R Kaltman
- Heart Development and Structural Diseases Branch, Division of Cardiovascular Sciences, NHLBI/NIH, Bethesda, MD, USA
| | | | | | - Elizabeth Goldmuntz
- Division of Cardiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Martina Brueckner
- Departments of Pediatrics and Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Richard Kim
- Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - George A Porter
- Department of Pediatrics, University of Rochester, Rochester, NY, USA
| | - Daniel Bernstein
- Department of Pediatrics, Stanford University, Palo Alto, CA, USA
| | - Wendy K Chung
- Departments of Pediatrics and Medicine, Columbia University Medical Center, New York, NY, USA
| | - Deepak Srivastava
- Gladstone Institute of Cardiovascular Disease and University of California San Francisco, San Francisco, CA, USA
| | - Martin Tristani-Firouzi
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Olga G Troyanskaya
- Flatiron Institute, Simons Foundation, New York, NY, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Lab, Berkeley, CA, USA
| | - Yufeng Shen
- Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Christine E Seidman
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Cardiology, Brigham and Women's Hospital, Boston, MA, USA
| | - Bruce D Gelb
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
97
|
Wu C, Zhao X, Welsh M, Costello K, Cao K, Abou Tayoun A, Li M, Sarmady M. Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing. Clin Chem 2020; 66:239-246. [PMID: 31672855 DOI: 10.1373/clinchem.2019.308213] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 08/19/2019] [Indexed: 12/25/2022]
Abstract
BACKGROUND Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning-based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label "uncertain" variants. RESULTS The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as "uncertain," with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories.
Collapse
Affiliation(s)
- Chao Wu
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Xiaonan Zhao
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Mark Welsh
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | | | - Kajia Cao
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Ahmad Abou Tayoun
- Department of Genetics, Al Jalila Children's Specialty Hospital, Dubai, UAE
| | - Marilyn Li
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA.,Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, Philadelphia, PA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA
| |
Collapse
|
98
|
Germline de novo mutation rates on exons versus introns in humans. Nat Commun 2020; 11:3304. [PMID: 32620809 PMCID: PMC7334200 DOI: 10.1038/s41467-020-17162-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 06/02/2020] [Indexed: 02/06/2023] Open
Abstract
A main assumption of molecular population genetics is that genomic mutation rate does not depend on sequence function. Challenging this assumption, a recent study has found a reduction in the mutation rate in exons compared to introns in somatic cells, ascribed to an enhanced exonic mismatch repair system activity. If this reduction happens also in the germline, it can compromise studies of population genomics, including the detection of selection when using introns as proxies for neutrality. Here we compile and analyze published germline de novo mutation data to test if the exonic mutation rate is also reduced in germ cells. After controlling for sampling bias in datasets with diseased probands and extended nucleotide context dependency, we find no reduction in the mutation rate in exons compared to introns in the germline. Therefore, there is no evidence that enhanced exonic mismatch repair activity determines the mutation rate in germline cells. Evidence that somatic mutation rates in introns exceed those in exons challenges the molecular evolution tenet that mutation rate and sequence function are independent. Here, authors analyze germline de novo mutations and reveal no evidence for mutation rate differences between exons and introns.
Collapse
|
99
|
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C, Buyske S, Matise TC, Muzny DM, Zody MC, Lander ES, Dutcher SK, Stitziel NO, Hall IM. Mapping and characterization of structural variation in 17,795 human genomes. Nature 2020; 583:83-89. [PMID: 32460305 PMCID: PMC7547914 DOI: 10.1038/s41586-020-2371-0] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Accepted: 05/18/2020] [Indexed: 12/18/2022]
Abstract
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Collapse
Affiliation(s)
- Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - David E Larson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Colby Chiang
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Indraniel Das
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Krishna L Kanchi
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Benjamin M Neale
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - Tara C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Nathan O Stitziel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA.
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA.
| |
Collapse
|
100
|
Feng C, Dai M, Liu Y, Chen M. Sequence repetitiveness quantification and de novo repeat detection by weighted k-mer coverage. Brief Bioinform 2020; 22:5855256. [PMID: 32591772 DOI: 10.1093/bib/bbaa086] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 04/10/2020] [Accepted: 04/22/2020] [Indexed: 11/12/2022] Open
Abstract
DNA repeats are abundant in eukaryotic genomes and have been proved to play a vital role in genome evolution and regulation. A large number of approaches have been proposed to identify various repeats in the genome. Some de novo repeat identification tools can efficiently generate sequence repetitive scores based on k-mer counting for repeat detection. However, we noticed that these tools can still be improved in terms of repetitive score calculation, sensitivity to segmental duplications and detection specificity. Therefore, here, we present a new computational approach named Repeat Locator (RepLoc), which is based on weighted k-mer coverage to quantify the genome sequence repetitiveness and locate the repetitive sequences. According to the repetitiveness map of the human genome generated by RepLoc, we found that there may be relationships between sequence repetitiveness and genome structures. A comprehensive benchmark shows that RepLoc is a more efficient k-mer counting based tool for de novo repeat detection. The RepLoc software is freely available at http://bis.zju.edu.cn/reploc.
Collapse
Affiliation(s)
- Cong Feng
- Ming Chen's laboratory in Zhejiang University
| | - Min Dai
- Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences
| | | | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University
| |
Collapse
|