1
|
Jiao J, Lv X, Shen C, Morigen M. Genome and transcriptomic analysis of the adaptation of Escherichia coli to environmental stresses. Comput Struct Biotechnol J 2024; 23:2132-2140. [PMID: 38817967 PMCID: PMC11137339 DOI: 10.1016/j.csbj.2024.05.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/05/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
In natural niches, bacteria are forced to spend most of their lives under various environmental stresses, such as nutrient limitation, heavy metal pollution, heat and antibiotic stress. To cope with adverse environments, bacterial genome can during the life cycle, produce potential adaptive mutants. The genomic changes, especially mutations, in the genes that encode RNA polymerase and transcription factors, might lead to variations in the transcriptome. These variations enable bacteria to cope with environmental stresses through physiological adaptation in response to stress. This paper reviews the recent contributions of genomic and transcriptomic analyses in understanding the adaption mechanism of Escherichia coli to environmental stresses. Various genomic changes have been observed in E. coli strains in laboratory or under natural stresses, including starvation, heavy metals, acidic conditions, heat shock and antibiotics. The mutations include slight changes (one to several nucleotides), deletions, insertions, chromosomal rearrangements and variations in copy numbers. The transcriptome of E. coli largely changes due to genomic mutations. However, the transcriptional profiles vary due to variations in stress selections. Cellular adaptation to the selections is associated with transcriptional changes resulting from genomic mutations. Changes in genome and transcriptome are cooperative and jointly affect the adaptation of E. coli to different environments. This comprehensive review reveals that coordination of genome mutations and transcriptional variations needs to be explored further to provide a better understanding of the mechanisms of bacterial adaptation to stresses.
Collapse
Affiliation(s)
- Jianlu Jiao
- State Key Laboratory of Reproductive Regulation & Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Xiaoli Lv
- Inner Mongolia Key Laboratory for Molecular Regulation of the Cell, School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Chongjie Shen
- State Key Laboratory of Reproductive Regulation & Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Morigen Morigen
- State Key Laboratory of Reproductive Regulation & Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, China
- Inner Mongolia Key Laboratory for Molecular Regulation of the Cell, School of Life Sciences, Inner Mongolia University, Hohhot, China
| |
Collapse
|
2
|
Redaelli S, Grati FR, Tritto V, Giannuzzi G, Recalcati MP, Sala E, Villa N, Crosti F, Roversi G, Malvestiti F, Zanatta V, Repetti E, Rodeschini O, Valtorta C, Catusi I, Romitti L, Martinoli E, Conconi D, Dalprà L, Lavitrano M, Riva P, Bentivegna A. Olfactory receptor genes and chromosome 11 structural aberrations: Players or spectators? HGG ADVANCES 2024; 5:100261. [PMID: 38160254 PMCID: PMC10820794 DOI: 10.1016/j.xhgg.2023.100261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 12/28/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024] Open
Abstract
The largest multi-gene family in metazoans is the family of olfactory receptor (OR) genes. Human ORs are organized in clusters over most chromosomes and seem to include >0.1% the human genome. Because 369 out of 856 OR genes are mapped on chromosome 11 (HSA11), we sought to determine whether they mediate structural rearrangements involving this chromosome. To this aim, we analyzed 220 specimens collected during diagnostic procedures involving structural rearrangements of chromosome 11. A total of 222 chromosomal abnormalities were included, consisting of inversions, deletions, translocations, duplications, and one insertion, detected by conventional chromosome analysis and/or fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (array-CGH). We verified by bioinformatics and statistical approaches the occurrence of breakpoints in cytobands with or without OR genes. We found that OR genes are not involved in chromosome 11 reciprocal translocations, suggesting that different DNA motifs and mechanisms based on homology or non-homology recombination can cause chromosome 11 structural alterations. We also considered the proximity between the chromosomal territories of chromosome 11 and its partner chromosomes involved in the translocations by using the deposited Hi-C data concerning the possible occurrence of chromosome interactions. Interestingly, most of the breakpoints are located in regions highly involved in chromosome interactions. Further studies should be carried out to confirm the potential role of chromosome territories' proximity in promoting genome structural variation, so fundamental in our understanding of the molecular basis of medical genetics and evolutionary genetics.
Collapse
Affiliation(s)
- Serena Redaelli
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy
| | - Francesca Romana Grati
- R&D, Cytogenetics, Molecular Genetics and Medical Genetics Unit, Toma Advanced Biomedical Assays S.p.A. (ImpactLab), 21052 Busto Arsizio, Italy
| | - Viviana Tritto
- Department of Medical Biotechnology and Translational Medicine, University of Milan, 20122 Milan, Italy
| | | | - Maria Paola Recalcati
- IRCCS Istituto Auxologico Italiano, Medical Cytogenetics Laboratory, 20095 Cusano Milanino, Italy
| | - Elena Sala
- UC Medical Genetics, Fondazione IRCCS San Gerardo dei Tintori, 20900 Monza, Italy
| | - Nicoletta Villa
- UC Medical Genetics, Fondazione IRCCS San Gerardo dei Tintori, 20900 Monza, Italy
| | - Francesca Crosti
- UC Medical Genetics, Fondazione IRCCS San Gerardo dei Tintori, 20900 Monza, Italy
| | - Gaia Roversi
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; UC Medical Genetics, Fondazione IRCCS San Gerardo dei Tintori, 20900 Monza, Italy
| | - Francesca Malvestiti
- R&D, Cytogenetics, Molecular Genetics and Medical Genetics Unit, Toma Advanced Biomedical Assays S.p.A. (ImpactLab), 21052 Busto Arsizio, Italy
| | - Valentina Zanatta
- R&D, Cytogenetics, Molecular Genetics and Medical Genetics Unit, Toma Advanced Biomedical Assays S.p.A. (ImpactLab), 21052 Busto Arsizio, Italy
| | - Elena Repetti
- R&D, Cytogenetics, Molecular Genetics and Medical Genetics Unit, Toma Advanced Biomedical Assays S.p.A. (ImpactLab), 21052 Busto Arsizio, Italy
| | - Ornella Rodeschini
- IRCCS Istituto Auxologico Italiano, Medical Cytogenetics Laboratory, 20095 Cusano Milanino, Italy
| | - Chiara Valtorta
- IRCCS Istituto Auxologico Italiano, Medical Cytogenetics Laboratory, 20095 Cusano Milanino, Italy
| | - Ilaria Catusi
- IRCCS Istituto Auxologico Italiano, Medical Cytogenetics Laboratory, 20095 Cusano Milanino, Italy
| | - Lorenza Romitti
- Pathology and Cytogenetics Laboratory, Clinical Pathology Department, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, 20162 Milan, Italy
| | - Emanuela Martinoli
- Department of Medical Biotechnology and Translational Medicine, University of Milan, 20122 Milan, Italy
| | - Donatella Conconi
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy
| | - Leda Dalprà
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; UC Medical Genetics, Fondazione IRCCS San Gerardo dei Tintori, 20900 Monza, Italy
| | - Marialuisa Lavitrano
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy
| | - Paola Riva
- Department of Medical Biotechnology and Translational Medicine, University of Milan, 20122 Milan, Italy
| | - Angela Bentivegna
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy.
| |
Collapse
|
3
|
Fleury H, MacEachern MK, Stiefel CM, Anand R, Sempeck C, Nebenfuehr B, Maurer-Alcalá K, Ball K, Proctor B, Belan O, Taylor E, Ortega R, Dodd B, Weatherly L, Dansoko D, Leung JW, Boulton SJ, Arnoult N. The APE2 nuclease is essential for DNA double-strand break repair by microhomology-mediated end joining. Mol Cell 2023; 83:1429-1445.e8. [PMID: 37044098 PMCID: PMC10164096 DOI: 10.1016/j.molcel.2023.03.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 01/18/2023] [Accepted: 03/16/2023] [Indexed: 04/14/2023]
Abstract
Microhomology-mediated end joining (MMEJ) is an intrinsically mutagenic pathway of DNA double-strand break (DSB) repair essential for proliferation of homologous recombination (HR)-deficient tumors. Although targeting MMEJ has emerged as a powerful strategy to eliminate HR-deficient (HRD) cancers, this is limited by an incomplete understanding of the mechanism and factors required for MMEJ repair. Here, we identify the APE2 nuclease as an MMEJ effector. We show that loss of APE2 inhibits MMEJ at deprotected telomeres and at intra-chromosomal DSBs and is epistatic with Pol Theta for MMEJ activity. Mechanistically, we demonstrate that APE2 possesses intrinsic flap-cleaving activity, that its MMEJ function in cells depends on its nuclease activity, and further identify an uncharacterized domain required for its recruitment to DSBs. We conclude that this previously unappreciated role of APE2 in MMEJ contributes to the addiction of HRD cells to APE2, which could be exploited in the treatment of cancer.
Collapse
Affiliation(s)
- Hubert Fleury
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Myles K MacEachern
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Clara M Stiefel
- Department of Radiation Oncology, University of Texas Health Science Center, San Antonio, TX, USA
| | - Roopesh Anand
- DSB Repair Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Colin Sempeck
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Benjamin Nebenfuehr
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Kelper Maurer-Alcalá
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Kerri Ball
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Bruce Proctor
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Ondrej Belan
- DSB Repair Metabolism Laboratory, The Francis Crick Institute, London, UK
| | - Erin Taylor
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Raquel Ortega
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Benjamin Dodd
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Laila Weatherly
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Djelika Dansoko
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Justin W Leung
- Department of Radiation Oncology, University of Texas Health Science Center, San Antonio, TX, USA
| | - Simon J Boulton
- DSB Repair Metabolism Laboratory, The Francis Crick Institute, London, UK; Artios Pharma Ltd, Babraham Research Campus, Cambridge CB22 3FH, UK
| | - Nausica Arnoult
- Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, USA.
| |
Collapse
|
4
|
Wang Y, Ling Y, Gong J, Zhao X, Zhou H, Xie B, Lou H, Zhuang X, Jin L, Fan S, Zhang G, Xu S. PGG.SV: a whole-genome-sequencing-based structural variant resource and data analysis platform. Nucleic Acids Res 2022; 51:D1109-D1116. [PMID: 36243989 PMCID: PMC9825616 DOI: 10.1093/nar/gkac905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/21/2022] [Accepted: 10/04/2022] [Indexed: 01/30/2023] Open
Abstract
Structural variations (SVs) play important roles in human evolution and diseases, but there is a lack of data resources concerning representative samples, especially for East Asians. Taking advantage of both next-generation sequencing and third-generation sequencing data at the whole-genome level, we developed the database PGG.SV to provide a practical platform for both regionally and globally representative structural variants. In its current version, PGG.SV archives 584 277 SVs obtained from whole-genome sequencing data of 6048 samples, including 1030 long-read sequencing genomes representing 177 global populations. PGG.SV provides (i) high-quality SVs with fine-scale and precise genomic locations in both GRCh37 and GRCh38, covering underrepresented SVs in existing sequencing and microarray data; (ii) hierarchical estimation of SV prevalence in geographical populations; (iii) informative annotations of SV-related genes, potential functions and clinical effects; (iv) an analysis platform to facilitate SV-based case-control association studies and (v) various visualization tools for understanding the SV structures in the human genome. Taken together, PGG.SV provides a user-friendly online interface, easy-to-use analysis tools and a detailed presentation of results. PGG.SV is freely accessible via https://www.biosino.org/pggsv.
Collapse
Affiliation(s)
| | | | | | - Xiaohan Zhao
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China
| | - Hanwen Zhou
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bo Xie
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Haiyi Lou
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xinhao Zhuang
- Key Laboratory of Computational Biology, National Genomics Data Center & Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 200438, China,Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai 201203, China
| | | | - Shaohua Fan
- Correspondence may also be addressed to Shaohua Fan.
| | - Guoqing Zhang
- Correspondence may also be addressed to Guoqing Zhang.
| | - Shuhua Xu
- To whom correspondence should be addressed. Tel: +86 21 31246617; Fax: +86 21 31246617;
| |
Collapse
|
5
|
Kim J, Huang AY, Johnson SL, Lai J, Isacco L, Jeffries AM, Miller MB, Lodato MA, Walsh CA, Lee EA. Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders. Nat Commun 2022; 13:5918. [PMID: 36207339 PMCID: PMC9546902 DOI: 10.1038/s41467-022-33642-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 09/26/2022] [Indexed: 01/21/2023] Open
Abstract
Replication errors and various genotoxins cause DNA double-strand breaks (DSBs) where error-prone repair creates genomic mutations, most frequently focal deletions, and defective repair may lead to neurodegeneration. Despite its pathophysiological importance, the extent to which faulty DSB repair alters the genome, and the mechanisms by which mutations arise, have not been systematically examined reflecting ineffective methods. Here, we develop PhaseDel, a computational method to detect focal deletions and characterize underlying mechanisms in single-cell whole genome sequences (scWGS). We analyzed high-coverage scWGS of 107 single neurons from 18 neurotypical individuals of various ages, and found that somatic deletions increased with age and in highly expressed genes in human brain. Our analysis of 50 single neurons from DNA repair-deficient diseases with progressive neurodegeneration (Cockayne syndrome, Xeroderma pigmentosum, and Ataxia telangiectasia) reveals elevated somatic deletions compared to age-matched controls. Distinctive mechanistic signatures and transcriptional associations suggest roles for somatic deletions in neurodegeneration.
Collapse
Affiliation(s)
- Junho Kim
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biological Sciences, Sungkyunkwan University, Suwon, South Korea
| | - August Yue Huang
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Shelby L Johnson
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Jenny Lai
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laura Isacco
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Ailsa M Jeffries
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Michael B Miller
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Michael A Lodato
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA, USA
- Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Christopher A Walsh
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA.
- Department of Neurology, Harvard Medical School, Boston, MA, USA.
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.
- Manton Center for Orphan Disease, Boston Children's Hospital, Boston, MA, USA.
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
6
|
Vialle RA, de Paiva Lopes K, Bennett DA, Crary JF, Raj T. Integrating whole-genome sequencing with multi-omic data reveals the impact of structural variants on gene regulation in the human brain. Nat Neurosci 2022; 25:504-514. [PMID: 35288716 PMCID: PMC9245608 DOI: 10.1038/s41593-022-01031-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 02/07/2022] [Indexed: 11/09/2022]
Abstract
Structural variants (SVs), genomic rearrangements of >50 bp, are an important source of genetic diversity and have been linked to many diseases. However, it remains unclear how they modulate human brain function and disease risk. Here, we report 170,996 SVs discovered using 1,760 short-read whole genomes from aged adults and Alzheimer’s disease individuals. By applying quantitative trait locus (SV-xQTL) analyses, we quantified the impact of cis-acting SVs on histone modifications, gene expression, splicing, and protein abundance in post-mortem brain tissues. More than 3,200 SVs were associated with at least one molecular phenotype. We found reproducibility of 65–99% SV-eQTLs across cohorts and brain regions. SV associations with mRNA and proteins shared the same direction of effect in more than 87% of SV-gene pairs. Mediation analysis showed ~8% of SV-eQTLs mediated by histone acetylation, and ~11% by splicing. Additionally, associations of SVs with progressive supranuclear palsy identified previously known and novel SVs.
Collapse
Affiliation(s)
- Ricardo A Vialle
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Katia de Paiva Lopes
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - John F Crary
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,Department of Pathology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Towfique Raj
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA. .,Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA. .,Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA. .,Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
7
|
Qi M, Stenson PD, Ball EV, Tainer JA, Bacolla A, Kehrer-Sawatzki H, Cooper DN, Zhao H. Distinct sequence features underlie microdeletions and gross deletions in the human genome. Hum Mutat 2021; 43:328-346. [PMID: 34918412 PMCID: PMC9069542 DOI: 10.1002/humu.24314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 11/02/2021] [Accepted: 12/14/2021] [Indexed: 11/18/2022]
Abstract
Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non‐B DNA‐forming repeats, GC‐content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ2, p < 2E−16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut‐off of 25–30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.
Collapse
Affiliation(s)
- Mengling Qi
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Edward V Ball
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - John A Tainer
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Albino Bacolla
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | | | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
| | - Huiying Zhao
- Department of Medical Research Center, Sun Yat-sen Memorial Hospital; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Guangzhou, China
| |
Collapse
|
8
|
Long-read technologies identify a hidden inverted duplication in a family with choroideremia. HGG ADVANCES 2021; 2:100046. [PMID: 35047838 PMCID: PMC8756506 DOI: 10.1016/j.xhgg.2021.100046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 07/01/2021] [Indexed: 12/03/2022] Open
Abstract
The lack of molecular diagnoses in rare genetic diseases can be explained by limitations of current standard genomic technologies. Upcoming long-read techniques have complementary strengths to overcome these limitations, with a particular strength in identifying structural variants. By using optical genome mapping and long-read sequencing, we aimed to identify the pathogenic variant in a large family with X-linked choroideremia. In this family, aberrant splicing of exon 12 of the choroideremia gene CHM was detected in 2003, but the underlying genomic defect remained elusive. Optical genome mapping and long-read sequencing approaches now revealed an intragenic 1,752 bp inverted duplication including exon 12 and surrounding regions, located downstream of the wild-type copy of exon 12. Both breakpoint junctions were confirmed with Sanger sequencing and segregate with the X-linked inheritance in the family. The breakpoint junctions displayed sequence microhomology suggestive for an erroneous replication mechanism as the origin of the structural variant. The inverted duplication is predicted to result in a hairpin formation of the pre-mRNA with the wild-type exon 12, leading to exon skipping in the mature mRNA. The identified inverted duplication is deemed the hidden pathogenic cause of disease in this family. Our study shows that optical genome mapping and long-read sequencing have significant potential for the identification of (hidden) structural variants in rare genetic diseases.
Collapse
|
9
|
Badet T, Fouché S, Hartmann FE, Zala M, Croll D. Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen. Nat Commun 2021; 12:3551. [PMID: 34112792 PMCID: PMC8192914 DOI: 10.1038/s41467-021-23862-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 05/11/2021] [Indexed: 02/05/2023] Open
Abstract
Species harbor extensive structural variation underpinning recent adaptive evolution. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We use machine learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Retrotransposons explain most inversion, indel and duplication events. We apply our model to Arabidopsis thaliana and show that our approach extends to more complex genomes. Finally, we analyze complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations are enriched for new rearrangements consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation accurately predicts the position of >74% of newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.
Collapse
Affiliation(s)
- Thomas Badet
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| | - Simone Fouché
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Fanny E Hartmann
- Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| | - Marcello Zala
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland.
| |
Collapse
|
10
|
Tran Q, Abyzov A. LongAGE: defining breakpoints of genomic structural variants through optimal and memory efficient alignments of long reads. Bioinformatics 2021; 37:1015-1017. [PMID: 32777815 PMCID: PMC8128450 DOI: 10.1093/bioinformatics/btaa703] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/07/2020] [Accepted: 07/29/2020] [Indexed: 11/25/2022] Open
Abstract
Summary Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation—LongAGE—based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. Availability and implementation LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Quang Tran
- Department of Computer Science, University of Memphis, Memphis, TN 38152, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
11
|
Safka Brozkova D, Uhrova Meszarosova A, Lassuthova P, Varga L, Staněk D, Borecká S, Laštůvková J, Čejnová V, Rašková D, Lhota F, Gašperíková D, Seeman P. The Cause of Hereditary Hearing Loss in GJB2 Heterozygotes-A Comprehensive Study of the GJB2/DFNB1 Region. Genes (Basel) 2021; 12:genes12050684. [PMID: 34062854 PMCID: PMC8147375 DOI: 10.3390/genes12050684] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 04/27/2021] [Accepted: 04/28/2021] [Indexed: 12/14/2022] Open
Abstract
Hearing loss is a genetically heterogeneous sensory defect, and the frequent causes are biallelic pathogenic variants in the GJB2 gene. However, patients carrying only one heterozygous pathogenic (monoallelic) GJB2 variant represent a long-lasting diagnostic problem. Interestingly, previous results showed that individuals with a heterozygous pathogenic GJB2 variant are two times more prevalent among those with hearing loss compared to normal-hearing individuals. This excess among patients led us to hypothesize that there could be another pathogenic variant in the GJB2 region/DFNB1 locus. A hitherto undiscovered variant could, in part, explain the cause of hearing loss in patients and would mean reclassifying them as patients with GJB2 biallelic pathogenic variants. In order to detect an unknown causal variant, we examined 28 patients using NGS with probes that continuously cover the 0.4 Mb in the DFNB1 region. An additional 49 patients were examined by WES to uncover only carriers. We did not reveal a second pathogenic variant in the DFNB1 region. However, in 19% of the WES-examined patients, the cause of hearing loss was found to be in genes other than the GJB2. We present evidence to show that a substantial number of patients are carriers of the GJB2 pathogenic variant, albeit only by chance.
Collapse
Affiliation(s)
- Dana Safka Brozkova
- Neurogenetic laboratory, Department of Paediatric Neurology, 2nd Faculty of Medicine, Charles University and University Hospital Motol, 15006 Prague, Czech Republic; (A.U.M.); (P.L.); (D.S.); (P.S.)
- Correspondence:
| | - Anna Uhrova Meszarosova
- Neurogenetic laboratory, Department of Paediatric Neurology, 2nd Faculty of Medicine, Charles University and University Hospital Motol, 15006 Prague, Czech Republic; (A.U.M.); (P.L.); (D.S.); (P.S.)
| | - Petra Lassuthova
- Neurogenetic laboratory, Department of Paediatric Neurology, 2nd Faculty of Medicine, Charles University and University Hospital Motol, 15006 Prague, Czech Republic; (A.U.M.); (P.L.); (D.S.); (P.S.)
| | - Lukáš Varga
- Department of Otorhinolaryngology–Head and Neck Surgery, Faculty of Medicine and University Hospital, Comenius University, 85107 Bratislava, Slovakia;
- Diabgene Laboratory, Institute of Experimental Endocrinology, Biomedical Research Center, Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (S.B.); (D.G.)
| | - David Staněk
- Neurogenetic laboratory, Department of Paediatric Neurology, 2nd Faculty of Medicine, Charles University and University Hospital Motol, 15006 Prague, Czech Republic; (A.U.M.); (P.L.); (D.S.); (P.S.)
| | - Silvia Borecká
- Diabgene Laboratory, Institute of Experimental Endocrinology, Biomedical Research Center, Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (S.B.); (D.G.)
| | - Jana Laštůvková
- Department of Medical Genetics, Masaryk Hospital in Usti nad Labem, Regional Health Corporation, 40011 Ústí nad Labem, Czech Republic; (J.L.); (V.Č.)
| | - Vlasta Čejnová
- Department of Medical Genetics, Masaryk Hospital in Usti nad Labem, Regional Health Corporation, 40011 Ústí nad Labem, Czech Republic; (J.L.); (V.Č.)
| | - Dagmar Rašková
- Centre for Medical Genetics and Reproductive Medicine GENNET, 17000 Prague, Czech Republic; (D.R.); (F.L.)
| | - Filip Lhota
- Centre for Medical Genetics and Reproductive Medicine GENNET, 17000 Prague, Czech Republic; (D.R.); (F.L.)
| | - Daniela Gašperíková
- Diabgene Laboratory, Institute of Experimental Endocrinology, Biomedical Research Center, Slovak Academy of Sciences, 84505 Bratislava, Slovakia; (S.B.); (D.G.)
| | - Pavel Seeman
- Neurogenetic laboratory, Department of Paediatric Neurology, 2nd Faculty of Medicine, Charles University and University Hospital Motol, 15006 Prague, Czech Republic; (A.U.M.); (P.L.); (D.S.); (P.S.)
| |
Collapse
|
12
|
van den Akker J, Hon L, Ondov A, Mahkovec Z, O'Connor R, Chan RC, Lock J, Zimmer AD, Rostamianfar A, Ginsberg J, Leon A, Topper S. Intronic Breakpoint Signatures Enhance Detection and Characterization of Clinically Relevant Germline Structural Variants. J Mol Diagn 2021; 23:612-629. [PMID: 33621668 DOI: 10.1016/j.jmoldx.2021.01.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 12/14/2020] [Accepted: 01/27/2021] [Indexed: 12/16/2022] Open
Abstract
The relevance of large copy number variants (CNVs) to hereditary disorders has been long recognized, and population sequencing efforts have chronicled many common structural variants (SVs). However, limited data are available on the clinical contribution of rare germline SVs. Here, a detailed characterization of SVs identified using targeted next-generation sequencing was performed. Across 50 genes associated with hereditary cancer and cardiovascular disorders, a minimum of 828 unique SVs were reported, including 584 fully characterized SVs. Almost 40% of CNVs were <5 kb, with one in three deletions impacting a single exon. Additionally, 36 mid-range deletions/duplications (50 to 250 bp), 21 mobile element insertions, 6 inversions, and 27 complex rearrangements were detected. This data set was used to model SV detection in a bioinformatics pipeline solely relying on read depth, which revealed that genome sequencing (30×) allows detection of 71%, a 500× panel only targeting coding regions 53%, and exome sequencing (100×) <20% of characterized SVs. SVs accounted for 14.1% of all unique pathogenic variants, supporting the importance of SVs in hereditary disorders. Robust SV detection requires an ensemble of variant-calling algorithms that utilize sequencing of intronic regions. These algorithms should use distinct data features representative of each class of mutational mechanism, including recombination between two sequences sharing high similarity, covariants inserted between CNV breakpoints, and complex rearrangements containing inverted sequences.
Collapse
|
13
|
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing. Methods Mol Biol 2021; 2243:1-25. [PMID: 33606250 DOI: 10.1007/978-1-0716-1103-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Increasingly affordable sequencing technologies are revolutionizing the field of genomic medicine. It is now feasible to interrogate all major classes of variation in an individual across the entire genome for less than $1000 USD. While the generation of patient sequence information using these technologies has become routine, the analysis and interpretation of this data remains the greatest obstacle to widespread clinical implementation. This chapter summarizes the steps to identify, annotate, and prioritize variant information required for clinical report generation. We discuss methods to detect each variant class and describe strategies to increase the likelihood of detecting causal variant(s) in Mendelian disease. Lastly, we describe a sample workflow for synthesizing large amount of genetic information into concise clinical reports.
Collapse
|
14
|
Niehus S, Jónsson H, Schönberger J, Björnsson E, Beyter D, Eggertsson HP, Sulem P, Stefánsson K, Halldórsson BV, Kehr B. PopDel identifies medium-size deletions simultaneously in tens of thousands of genomes. Nat Commun 2021; 12:730. [PMID: 33526789 PMCID: PMC7851401 DOI: 10.1038/s41467-020-20850-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 12/14/2020] [Indexed: 12/14/2022] Open
Abstract
Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel's running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies.
Collapse
Affiliation(s)
- Sebastian Niehus
- Regensburg Center for Interventional Immunology (RCI), Regensburg, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | | | - Janina Schönberger
- Regensburg Center for Interventional Immunology (RCI), Regensburg, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Eythór Björnsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- Faculty of Medicine, School of Heath Sciences, University of Iceland, Reykjavík, Iceland
- Department of Internal Medicine, Landspítali-The National University Hospital of Iceland, Reykjavík, Iceland
| | | | | | | | - Kári Stefánsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- Faculty of Medicine, School of Heath Sciences, University of Iceland, Reykjavík, Iceland
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Science and Engineering, Reykjavik University, Reykjavík, Iceland
| | - Birte Kehr
- Regensburg Center for Interventional Immunology (RCI), Regensburg, Germany.
- Berlin Institute of Health (BIH), Berlin, Germany.
- Charité-Universitätsmedizin Berlin, Berlin, Germany.
- Univeristät Regensburg, Regensburg, Germany.
| |
Collapse
|
15
|
Lee YG, Lee JY, Kim J, Kim YJ. Insertion variants missing in the human reference genome are widespread among human populations. BMC Biol 2020; 18:167. [PMID: 33187521 PMCID: PMC7666470 DOI: 10.1186/s12915-020-00894-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/09/2020] [Indexed: 01/07/2023] Open
Abstract
Background Structural variants comprise diverse genomic arrangements including deletions, insertions, inversions, and translocations, which can generally be detected in humans through sequence comparison to the reference genome. Among structural variants, insertions are the least frequently identified variants, mainly due to ascertainment bias in the reference genome, lack of previous sequence knowledge, and low complexity of typical insertion sequences. Though recent developments in long-read sequencing deliver promise in annotating individual non-reference insertions, population-level catalogues on non-reference insertion variants have not been identified and the possible functional roles of these hidden variants remain elusive. Results To detect non-reference insertion variants, we developed a pipeline, InserTag, which generates non-reference contigs by local de novo assembly and then infers the full-sequence of insertion variants by tracing contigs from non-human primates and other human genome assemblies. Application of the pipeline to data from 2535 individuals of the 1000 Genomes Project helped identify 1696 non-reference insertion variants and re-classify the variants as retention of ancestral sequences or novel sequence insertions based on the ancestral state. Genotyping of the variants showed that individuals had, on average, 0.92-Mbp sequences missing from the reference genome, 92% of the variants were common (allele frequency > 5%) among human populations, and more than half of the variants were major alleles. Among human populations, African populations were the most divergent and had the most non-reference sequences, which was attributed to the greater prevalence of high-frequency insertion variants. The subsets of insertion variants were in high linkage disequilibrium with phenotype-associated SNPs and showed signals of recent continent-specific selection. Conclusions Non-reference insertion variants represent an important type of genetic variation in the human population, and our developed pipeline, InserTag, provides the frameworks for the detection and genotyping of non-reference sequences missing from human populations. Supplementary information Supplementary information accompanies this paper at 10.1186/s12915-020-00894-1.
Collapse
Affiliation(s)
- Young-Gun Lee
- Department of Integrated Omics for Biomedical Science, WCU Graduate School, Yonsei University, Seoul, Republic of Korea
| | - Jin-Young Lee
- Department of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul, Republic of Korea
| | - Junhyong Kim
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Young-Joon Kim
- Department of Integrated Omics for Biomedical Science, WCU Graduate School, Yonsei University, Seoul, Republic of Korea. .,Department of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
16
|
Uchiyama Y, Yamaguchi D, Iwama K, Miyatake S, Hamanaka K, Tsuchida N, Aoi H, Azuma Y, Itai T, Saida K, Fukuda H, Sekiguchi F, Sakaguchi T, Lei M, Ohori S, Sakamoto M, Kato M, Koike T, Takahashi Y, Tanda K, Hyodo Y, Honjo RS, Bertola DR, Kim CA, Goto M, Okazaki T, Yamada H, Maegaki Y, Osaka H, Ngu LH, Siew CG, Teik KW, Akasaka M, Doi H, Tanaka F, Goto T, Guo L, Ikegawa S, Haginoya K, Haniffa M, Hiraishi N, Hiraki Y, Ikemoto S, Daida A, Hamano SI, Miura M, Ishiyama A, Kawano O, Kondo A, Matsumoto H, Okamoto N, Okanishi T, Oyoshi Y, Takeshita E, Suzuki T, Ogawa Y, Handa H, Miyazono Y, Koshimizu E, Fujita A, Takata A, Miyake N, Mizuguchi T, Matsumoto N. Efficient detection of copy-number variations using exome data: Batch- and sex-based analyses. Hum Mutat 2020; 42:50-65. [PMID: 33131168 DOI: 10.1002/humu.24129] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 09/29/2020] [Accepted: 10/15/2020] [Indexed: 12/16/2022]
Abstract
Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
Collapse
Affiliation(s)
- Yuri Uchiyama
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | | - Kazuhiro Iwama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomi Tsuchida
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Aoi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiteru Azuma
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Toshiyuki Itai
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ken Saida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Fukuda
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Futoshi Sekiguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohiro Sakaguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ming Lei
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Masamune Sakamoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Mitsuhiro Kato
- Department of Pediatrics, Showa University School of Medicine, Tokyo, Japan
| | - Takayoshi Koike
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Yukitoshi Takahashi
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Koichi Tanda
- Department of Pediatrics, Japanese Red Cross Kyoto Daiichi Hospital, Kyoto, Japan
| | - Yuki Hyodo
- Department of Child Neurology, Okayama University Hospital, Okayama, Japan
| | - Rachel S Honjo
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Debora Romeo Bertola
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Chong Ae Kim
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Masahide Goto
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Tetsuya Okazaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hiroyuki Yamada
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Yoshihiro Maegaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hitoshi Osaka
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Lock-Hock Ngu
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Ch'ng G Siew
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Keng W Teik
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Manami Akasaka
- Department of Pediatrics, Iwate Medical University School of Medicine, Morioka, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohide Goto
- Division of Neurology, Kanagawa Children's Medical Center, Yokohama, Japan
| | - Long Guo
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Kazuhiro Haginoya
- Department of Pediatric Neurology, Miyagi Children's Hospital, Sendai, Japan
| | - Muzhirah Haniffa
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Nozomi Hiraishi
- Department of Pediatrics, Yokohama City University Medical Center, Yokohama, Japan
| | - Yoko Hiraki
- Hiroshima Municipal Center for Child Health and Development, Hiroshima, Japan
| | - Satoru Ikemoto
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Atsuro Daida
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Shin-Ichiro Hamano
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Masaki Miura
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan.,Department of Pediatrics, Nagaoka Red Cross Hospital, Nagaoka, Japan
| | - Akihiko Ishiyama
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Osamu Kawano
- Department of Pediatrics, Hokkaido University Hospital, Sapporo, Japan
| | - Akane Kondo
- Clinical Genetics Center, Shikoku Medical Center for Children and Adults, National Hospital Organization, Kagawa, Japan
| | - Hiroshi Matsumoto
- Department of Pediatrics, National Defense Medical College, Saitama, Japan
| | - Nobuhiko Okamoto
- Department of Medical Genetics, Osaka Women's and Children's Hospital, Osaka, Japan
| | - Tohru Okanishi
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan.,Department of Child Neurology, Comprehensive Epilepsy Center, Seirei Hamamatsu General Hospital, Hamamatsu, Japan
| | - Yukimi Oyoshi
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Eri Takeshita
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Toshifumi Suzuki
- Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiyuki Ogawa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Hiroshi Handa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Yayoi Miyazono
- Department of Child Health, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Takata
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Noriko Miyake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| |
Collapse
|
17
|
Kumar S, Harmanci A, Vytheeswaran J, Gerstein MB. SVFX: a machine learning framework to quantify the pathogenicity of structural variants. Genome Biol 2020; 21:274. [PMID: 33168059 PMCID: PMC7650198 DOI: 10.1186/s13059-020-02178-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 10/12/2020] [Indexed: 02/07/2023] Open
Abstract
There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.
Collapse
Affiliation(s)
- Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Arif Harmanci
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Jagath Vytheeswaran
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, 260/266 Whitney Avenue, PO Box 208114, New Haven, CT, 06520, USA.
| |
Collapse
|
18
|
Sekar S, Tomasini L, Proukakis C, Bae T, Manlove L, Jang Y, Scuderi S, Zhou B, Kalyva M, Amiri A, Mariani J, Sedlazeck FJ, Urban AE, Vaccarino FM, Abyzov A. Complex mosaic structural variations in human fetal brains. Genome Res 2020; 30:1695-1704. [PMID: 33122304 PMCID: PMC7706730 DOI: 10.1101/gr.262667.120] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Accepted: 09/12/2020] [Indexed: 11/24/2022]
Abstract
Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions, and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15-21 wk postconception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones, and we timed its origin to ∼14 wk postconception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extrachromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs and present in ∼10% of neurons, SVs in developing human brain affect a comparable number of bases in the genome (∼6200 vs. ∼4000 bp), implying that they may have similar functional consequences.
Collapse
Affiliation(s)
- Shobana Sekar
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Livia Tomasini
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Queen Square Institute of Neurology, University College London, London NW3 2PF, United Kingdom
| | - Taejeong Bae
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Logan Manlove
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Yeongjun Jang
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Soraya Scuderi
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Bo Zhou
- Departments of Psychiatry and Genetics, Stanford University, Palo Alto, California 94305, USA
| | - Maria Kalyva
- Department of Clinical and Movement Neurosciences, Queen Square Institute of Neurology, University College London, London NW3 2PF, United Kingdom
| | - Anahita Amiri
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Jessica Mariani
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Alexander E Urban
- Departments of Psychiatry and Genetics, Stanford University, Palo Alto, California 94305, USA
| | - Flora M Vaccarino
- Child Study Center and Department of Neuroscience, Yale University, New Haven, Connecticut 06520, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| |
Collapse
|
19
|
Rommel Fuentes R, Hesselink T, Nieuwenhuis R, Bakker L, Schijlen E, van Dooijeweert W, Diaz Trivino S, de Haan JR, Sanchez Perez G, Zhang X, Fransz P, de Jong H, van Dijk ADJ, de Ridder D, Peters SA. Meiotic recombination profiling of interspecific hybrid F1 tomato pollen by linked read sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:480-492. [PMID: 31820490 DOI: 10.1111/tpj.14640] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 11/25/2019] [Accepted: 12/04/2019] [Indexed: 06/10/2023]
Abstract
Genome wide screening of pooled pollen samples from a single interspecific F1 hybrid obtained from a cross between tomato, Solanum lycopersicum and its wild relative, Solanum pimpinellifolium using linked read sequencing of the haploid nuclei, allowed profiling of the crossover (CO) and gene conversion (GC) landscape. We observed a striking overlap between cold regions of CO in the male gametes and our previously established F6 recombinant inbred lines (RILs) population. COs were overrepresented in non-coding regions in the gene promoter and 5'UTR regions of genes. Poly-A/T and AT rich motifs were found enriched in 1 kb promoter regions flanking the CO sites. Non-crossover associated allelic and ectopic GCs were detected in most chromosomes, confirming that besides CO, GC represents also a source for genetic diversity and genome plasticity in tomato. Furthermore, we identified processed break junctions pointing at the involvement of both homology directed and non-homology directed repair pathways, suggesting a recombination machinery in tomato that is more complex than currently anticipated.
Collapse
Affiliation(s)
- Roven Rommel Fuentes
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Thamara Hesselink
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Ronald Nieuwenhuis
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Linda Bakker
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Elio Schijlen
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Willem van Dooijeweert
- Centre for Genetic Resources, Wageningen University and Research, Wageningen, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Sara Diaz Trivino
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Jorn R de Haan
- Genetwister Technologies B.V., Nieuwe Kanaal 7b, 6709 PA, Wageningen, The Netherlands
| | - Gabino Sanchez Perez
- Genetwister Technologies B.V., Nieuwe Kanaal 7b, 6709 PA, Wageningen, The Netherlands
| | - Xinyue Zhang
- Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
| | - Paul Fransz
- Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
| | - Hans de Jong
- Laboratory of Genetics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Aalt D J van Dijk
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
- Biometris, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Sander A Peters
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| |
Collapse
|
20
|
Akdemir KC, Le VT, Chandran S, Li Y, Verhaak RG, Beroukhim R, Campbell PJ, Chin L, Dixon JR, Futreal PA. Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer. Nat Genet 2020; 52:294-305. [PMID: 32024999 PMCID: PMC7058537 DOI: 10.1038/s41588-019-0564-y] [Citation(s) in RCA: 149] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 12/03/2019] [Indexed: 12/21/2022]
Abstract
Chromatin is folded into successive layers to organize linear DNA. Genes within the same topologically associating domains (TADs) demonstrate similar expression and histone-modification profiles, and boundaries separating different domains have important roles in reinforcing the stability of these features. Indeed, domain disruptions in human cancers can lead to misregulation of gene expression. However, the frequency of domain disruptions in human cancers remains unclear. Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumor types, we analyzed 288,457 somatic structural variations (SVs) to understand the distributions and effects of SVs across TADs. Notably, SVs can lead to the fusion of discrete TADs, and complex rearrangements markedly change chromatin folding maps in the cancer genomes. Notably, only 14% of the boundary deletions resulted in a change in expression in nearby genes of more than twofold.
Collapse
Affiliation(s)
- Kadir C Akdemir
- Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Victoria T Le
- Salk Institute for Biological Studies, La Jolla, CA, USA
| | | | - Yilong Li
- Wellcome Trust Sanger Institute, Cambridge, UK
| | - Roel G Verhaak
- Division of Computational Biology, The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Peter J Campbell
- Wellcome Trust Sanger Institute, Cambridge, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Lynda Chin
- Institute for Health Transformation University of Texas, Houston, TX, USA
| | - Jesse R Dixon
- Salk Institute for Biological Studies, La Jolla, CA, USA
| | - P Andrew Futreal
- Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
21
|
Kanda S, Ohmuraya M, Akagawa H, Horita S, Yoshida Y, Kaneko N, Sugawara N, Ishizuka K, Miura K, Harita Y, Yamamoto T, Oka A, Araki K, Furukawa T, Hattori M. Deletion in the Cobalamin Synthetase W Domain-Containing Protein 1 Gene Is associated with Congenital Anomalies of the Kidney and Urinary Tract. J Am Soc Nephrol 2020; 31:139-147. [PMID: 31862704 PMCID: PMC6934996 DOI: 10.1681/asn.2019040398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Accepted: 10/02/2019] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Researchers have identified about 40 genes with mutations that result in the most common cause of CKD in children, congenital anomalies of the kidney and urinary tract (CAKUT), but approximately 85% of patients with CAKUT lack mutations in these genes. The anomalies that comprise CAKUT are clinically heterogenous, and thought to be caused by disturbances at different points in kidney development. However, identification of novel CAKUT-causing genes remains difficult because of their variable expressivity, incomplete penetrance, and heterogeneity. METHODS We investigated two generations of a family that included two siblings with CAKUT. Although the parents and another child were healthy, the two affected siblings presented the same manifestations, unilateral renal agenesis and contralateral renal hypoplasia. To search for a novel causative gene of CAKUT, we performed whole-exome and whole-genome sequencing of DNA from the family members. We also generated two lines of genetically modified mice with a gene deletion present only in the affected siblings, and performed immunohistochemical and phenotypic analyses of these mice. RESULTS We found that the affected siblings, but not healthy family members, had a homozygous deletion in the Cobalamin Synthetase W Domain-Containing Protein 1 (CBWD1) gene. Whole-genome sequencing uncovered genomic breakpoints, which involved exon 1 of CBWD1, harboring the initiating codon. Immunohistochemical analysis revealed high expression of Cbwd1 in the nuclei of the ureteric bud cells in the developing kidneys. Cbwd1-deficient mice showed CAKUT phenotypes, including hydronephrosis, hydroureters, and duplicated ureters. CONCLUSIONS The identification of a deletion in CBWD1 gene in two siblings with CAKUT implies a role for CBWD1 in the etiology of some cases of CAKUT.
Collapse
Affiliation(s)
- Shoichiro Kanda
- Department of Pediatrics, The University of Tokyo, Tokyo, Japan;
- Department of Pediatric Nephrology
| | - Masaki Ohmuraya
- Department of Genetics, Hyogo College of Medicine, Hyogo, Japan
| | - Hiroyuki Akagawa
- Tokyo Women's Medical University Institute for Integrated Medical Sciences, Tokyo, Japan
| | - Shigeru Horita
- Department of Pathology, Kidney Center, School of Medicine, and
| | | | | | | | | | | | - Yutaka Harita
- Department of Pediatrics, The University of Tokyo, Tokyo, Japan
| | - Toshiyuki Yamamoto
- Tokyo Women's Medical University Institute for Integrated Medical Sciences, Tokyo, Japan
- Institute of Medical Genetics, Tokyo Women's Medical University, Tokyo, Japan
| | - Akira Oka
- Department of Pediatrics, The University of Tokyo, Tokyo, Japan
| | - Kimi Araki
- Division of Developmental Genetics, Institute of Resource Development and Analysis, Kumamoto University, Kumamoto, Japan; and
| | - Toru Furukawa
- Tokyo Women's Medical University Institute for Integrated Medical Sciences, Tokyo, Japan
- Department of Investigative Pathology, Tohoku University Graduate School of Medicine, Sendai, Japan
| | | |
Collapse
|
22
|
Li C, Jiang Y, Li S. LEMON: a method to construct the local strains at horizontal gene transfer sites in gut metagenomics. BMC Bioinformatics 2019; 20:702. [PMID: 31881904 PMCID: PMC6933643 DOI: 10.1186/s12859-019-3301-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 12/02/2019] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Horizontal Gene Transfer (HGT) refers to the transfer of genetic materials between organisms through mechanisms other than parent-offspring inheritance. HGTs may affect human health through a large number of microorganisms, especially the gut microbiomes which the human body harbors. The transferred segments may lead to complicated local genome structural variations. Details of the local genome structure can elucidate the effects of the HGTs. RESULTS In this work, we propose a graph-based method to reconstruct the local strains from the gut metagenomics data at the HGT sites. The method is implemented in a package named LEMON. The simulated results indicate that the method can identify transferred segments accurately on reference sequences of the microbiome. Simulation results illustrate that LEMON could recover local strains with complicated structure variation. Furthermore, the gene fusion points detected in real data near HGT breakpoints validate the accuracy of LEMON. Some strains reconstructed by LEMON have a replication time profile with lower standard error, which demonstrates HGT events recovered by LEMON is reliable. CONCLUSIONS Through LEMON we could reconstruct the sequence structure of bacteria, which harbors HGT events. This helps us to study gene flow among different microbial species.
Collapse
Affiliation(s)
- Chen Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, HongKong, China
| | - Yiqi Jiang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, HongKong, China
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, HongKong, China
| |
Collapse
|
23
|
Roychowdhury T, Abyzov A. Chromatin organization modulates the origin of heritable structural variations in human genome. Nucleic Acids Res 2019; 47:2766-2777. [PMID: 30773596 PMCID: PMC6451188 DOI: 10.1093/nar/gkz103] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 02/06/2019] [Accepted: 02/14/2019] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) in the human genome originate from different mechanisms related to DNA repair, replication errors, and retrotransposition. Our analyses of 26 927 SVs from the 1000 Genomes Project revealed differential distributions and consequences of SVs of different origin, e.g. deletions from non-allelic homologous recombination (NAHR) are more prone to disrupt chromatin organization while processed pseudogenes can create accessible chromatin. Spontaneous double stranded breaks (DSBs) are the best predictor of enrichment of NAHR deletions in open chromatin. This evidence, along with strong physical interaction of NAHR breakpoints belonging to the same deletion suggests that majority of NAHR deletions are non-meiotic i.e. originate from errors during homology directed repair (HDR) of spontaneous DSBs. In turn, the origin of the spontaneous DSBs is associated with transcription factor binding in accessible chromatin revealing the vulnerability of functional, open chromatin. The chromatin itself is enriched with repeats, particularly fixed Alu elements that provide the homology required to maintain stability via HDR. Through co-localization of fixed Alus and NAHR deletions in open chromatin we hypothesize that old Alu expansion had a stabilizing role on the human genome.
Collapse
Affiliation(s)
- Tanmoy Roychowdhury
- Mayo Clinic, Department of Health Sciences Research, Center for Individualized Medicine, Rochester, MN 55905, USA
| | - Alexej Abyzov
- Mayo Clinic, Department of Health Sciences Research, Center for Individualized Medicine, Rochester, MN 55905, USA
| |
Collapse
|
24
|
Poot M. Concurrent Structural and Single Nucleotide Variation Resulting from a Single Replication-Based Mechanism. Mol Syndromol 2019; 10:183-185. [PMID: 31602189 DOI: 10.1159/000501382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2019] [Indexed: 11/19/2022] Open
|
25
|
Schimmel J, van Schendel R, den Dunnen JT, Tijsterman M. Templated Insertions: A Smoking Gun for Polymerase Theta-Mediated End Joining. Trends Genet 2019; 35:632-644. [PMID: 31296341 DOI: 10.1016/j.tig.2019.06.001] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 05/27/2019] [Accepted: 06/06/2019] [Indexed: 01/23/2023]
Abstract
A recognized source of disease-causing genome alterations is erroneous repair of broken chromosomes, which can be executed by two distinct mechanisms: non-homologous end joining (NHEJ) and the recently discovered polymerase theta-mediated end joining (TMEJ) pathway. While TMEJ has previously been considered to act as an alternative mechanism backing up NHEJ, recent work points to a role for TMEJ in the repair of replication-associated DNA breaks that are excluded from repair through homologous recombination. Because of its mode of action, TMEJ is intrinsically mutagenic and sometimes leaves behind a recognizable genomic scar when joining chromosome break ends (i.e., 'templated insertions'). This review article focuses on the intriguing observation that this polymerase theta signature is frequently observed in disease alleles, arguing for a prominent role of this double-strand break repair pathway in genome diversification and disease-causing spontaneous mutagenesis in humans.
Collapse
Affiliation(s)
- Joost Schimmel
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Robin van Schendel
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Johan T den Dunnen
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Marcel Tijsterman
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands.
| |
Collapse
|
26
|
Abascal F, Juan D, Jungreis I, Kellis M, Martinez L, Rigau M, Rodriguez JM, Vazquez J, Tress ML. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res 2019; 46:7070-7084. [PMID: 29982784 PMCID: PMC6101605 DOI: 10.1093/nar/gky587] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/18/2018] [Indexed: 12/16/2022] Open
Abstract
Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.
Collapse
Affiliation(s)
- Federico Abascal
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - David Juan
- Comparative Genomics Lab, Instituto de Biologica Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA and Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Laura Martinez
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| | - Maria Rigau
- Computational Biology Life Sciences Group, Barcelona Supercomputing Center, Barcelona, Spain
| | - Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Jesus Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre, Madrid, Spain
| |
Collapse
|
27
|
van Heyningen V. Genome sequencing-the dawn of a game-changing era. Heredity (Edinb) 2019; 123:58-66. [PMID: 31189904 PMCID: PMC6781137 DOI: 10.1038/s41437-019-0226-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 04/16/2019] [Indexed: 01/14/2023] Open
Abstract
The development of genome sequencing technologies has revolutionized the biological sciences in ways which could not have been imagined at the time. This article sets out to document the dawning of the age of genomics and to consider the impact of this revolution on biological investigation, our understanding of life, and the relationship between science and society.
Collapse
Affiliation(s)
- Veronica van Heyningen
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Crewe Road, Edinburgh, EH4 2XU, UK.
- Institute of Ophthalmology, University College London, 11-43 Bath Street, London, EC1V 9EL, UK.
| |
Collapse
|
28
|
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019; 20:117. [PMID: 31159850 PMCID: PMC6547561 DOI: 10.1186/s13059-019-1720-5] [Citation(s) in RCA: 236] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 05/20/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Xiaoxi Liu
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Chikashi Terao
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| |
Collapse
|
29
|
Saranathan N, Biswas B, Patra A, Vivekanandan P. G-quadruplexes may determine the landscape of recombination in HSV-1. BMC Genomics 2019; 20:382. [PMID: 31096907 PMCID: PMC6524338 DOI: 10.1186/s12864-019-5731-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 04/24/2019] [Indexed: 12/13/2022] Open
Abstract
Background Several lines of evidence suggest that recombination plays a central role in replication and evolution of herpes simplex virus-1 (HSV-1). G-quadruplex (G4)-motifs have been linked to recombination events in human and microbial genomes, but their role in recombination has not been studied in DNA viruses. Results The availability of near full-length sequences from 40 HSV-1 recombinant strains with exact position of the recombination breakpoints provided us with a unique opportunity to investigate the role of G4-motifs in recombination among herpes viruses. We mapped the G4-motifs in the parental and all the 40 recombinant strains. Interestingly, the genome-wide distribution of breakpoints closely mirrors the G4 densities in the HSV-1 genome; regions of the genome with higher G4 densities had higher number of recombination breakpoints. Biophysical characterization of oligonucleotides from a subset of predicted G4-motifs confirmed the formation of G-quadruplex structures. Our analysis also reveals that G4-motifs are enriched in regions flanking the recombination breakpoints. Interestingly, about 11% of breakpoints lie within a G4-motif, making these DNA secondary structures hotspots for recombination in the HSV-1 genome. Breakpoints within G4-motifs predominantly lie within G4-clusters rather than individual G4-motifs. Of note, we identified the terminal guanosine of G4-clusters at the boundaries of the UL (unique long) region on either side of the OriL (origin of replication within UL) represented the commonest breakpoint among the HSV-1 recombinants. Conclusion Our findings suggest a correlation between the HSV-1 recombination landscape and the distribution of G4-motifs and G4-clusters, with possible implications for the evolution of DNA viruses. Electronic supplementary material The online version of this article (10.1186/s12864-019-5731-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nandhini Saranathan
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| | - Banhi Biswas
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India
| | - Anupam Patra
- International Centre for Genetic Engineering and Biotechnology, New Delhi, India
| | - Perumal Vivekanandan
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, India.
| |
Collapse
|
30
|
Lin YL, Gokcumen O. Fine-Scale Characterization of Genomic Structural Variation in the Human Genome Reveals Adaptive and Biomedically Relevant Hotspots. Genome Biol Evol 2019; 11:1136-1151. [PMID: 30887040 PMCID: PMC6475128 DOI: 10.1093/gbe/evz058] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2019] [Indexed: 12/25/2022] Open
Abstract
Genomic structural variants (SVs) are distributed nonrandomly across the human genome. The "hotspots" of SVs have been implicated in evolutionary innovations, as well as medical conditions. However, the evolutionary and biomedical features of these hotspots remain incompletely understood. Here, we analyzed data from 2,504 genomes to construct a refined map of 1,148 SV hotspots in human genomes. We confirmed that segmental duplication-related nonallelic homologous recombination is an important mechanistic driver of SV hotspot formation. However, to our surprise, we also found that a majority of SVs in hotspots do not form through such recombination-based mechanisms, suggesting diverse mechanistic and selective forces shaping hotspots. Indeed, our evolutionary analyses showed that the majority of SV hotspots are within gene-poor regions and evolve under relaxed negative selection or neutrality. However, we still found a small subset of SV hotspots harboring genes that are enriched for anthropologically crucial functions and evolve under geography-specific and balancing adaptive forces. These include two independent hotspots on different chromosomes affecting alpha and beta hemoglobin gene clusters. Biomedically, we found that the SV hotspots coincide with breakpoints of clinically relevant, large de novo SVs, significantly more often than genome-wide expectations. For example, we showed that the breakpoints of multiple large SVs, which lead to idiopathic short stature, coincide with SV hotspots. Therefore, the mutational instability in SV hotpots likely enables chromosomal breaks that lead to pathogenic structural variation formations. Overall, our study contributes to a better understanding of the mutational and adaptive landscape of the genome.
Collapse
Affiliation(s)
- Yen-Lung Lin
- Department of Biological Sciences, University at Buffalo
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo
- Corresponding author: E-mail: or
| |
Collapse
|
31
|
Beck CR, Carvalho CMB, Akdemir ZC, Sedlazeck FJ, Song X, Meng Q, Hu J, Doddapaneni H, Chong Z, Chen ES, Thornton PC, Liu P, Yuan B, Withers M, Jhangiani SN, Kalra D, Walker K, English AC, Han Y, Chen K, Muzny DM, Ira G, Shaw CA, Gibbs RA, Hastings PJ, Lupski JR. Megabase Length Hypermutation Accompanies Human Structural Variation at 17p11.2. Cell 2019; 176:1310-1324.e10. [PMID: 30827684 PMCID: PMC6438178 DOI: 10.1016/j.cell.2019.01.045] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 11/06/2018] [Accepted: 01/25/2019] [Indexed: 01/16/2023]
Abstract
DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to ∼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.
Collapse
Affiliation(s)
- Christine R Beck
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | | | - Zeynep C Akdemir
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | | | - Xiaofei Song
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Qingchang Meng
- Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | - Jianhong Hu
- Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | | | - Zechen Chong
- Department of Genetics and the Informatics Institute, the University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Edward S Chen
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Philip C Thornton
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Bo Yuan
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Marjorie Withers
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | | | - Divya Kalra
- Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | | | - Adam C English
- Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | - Yi Han
- Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | - Grzegorz Ira
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Chad A Shaw
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA; Human Genome Sequencing Center, BCM, Houston, TX 77030, USA
| | - P J Hastings
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA; Dan L. Duncan Comprehensive Cancer Center, BCM, Houston, TX 77030, USA.
| | - James R Lupski
- Department of Molecular and Human Genetics, BCM, Houston, TX 77030, USA; Human Genome Sequencing Center, BCM, Houston, TX 77030, USA; Department of Pediatrics, BCM, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA; Dan L. Duncan Comprehensive Cancer Center, BCM, Houston, TX 77030, USA.
| |
Collapse
|
32
|
Rigau M, Juan D, Valencia A, Rico D. Intronic CNVs and gene expression variation in human populations. PLoS Genet 2019; 15:e1007902. [PMID: 30677042 PMCID: PMC6345438 DOI: 10.1371/journal.pgen.1007902] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/17/2018] [Indexed: 11/19/2022] Open
Abstract
Introns can be extraordinarily large and they account for the majority of the DNA sequence in human genes. However, little is known about their population patterns of structural variation and their functional implication. By combining the most extensive maps of CNVs in human populations, we have found that intronic losses are the most frequent copy number variants (CNVs) in protein-coding genes in human, with 12,986 intronic deletions, affecting 4,147 genes (including 1,154 essential genes and 1,638 disease-related genes). This intronic length variation results in dozens of genes showing extreme population variability in size, with 40 genes with 10 or more different sizes and up to 150 allelic sizes. Intronic losses are frequent in evolutionarily ancient genes that are highly conserved at the protein sequence level. This result contrasts with losses overlapping exons, which are observed less often than expected by chance and almost exclusively affect primate-specific genes. An integrated analysis of CNVs and RNA-seq data showed that intronic loss can be associated with significant differences in gene expression levels in the population (CNV-eQTLs). These intronic CNV-eQTLs regions are enriched for intronic enhancers and can be associated with expression differences of other genes showing long distance intron-promoter 3D interactions. Our data suggests that intronic structural variation of protein-coding genes makes an important contribution to the variability of gene expression and splicing in human populations.
Collapse
Affiliation(s)
- Maria Rigau
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - David Juan
- Institut de Biologia Evolutiva, Consejo Superior de Investigaciones Científicas–Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Daniel Rico
- Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
33
|
Bravo P, Darvish H, Tafakhori A, Azcona LJ, Johari AH, Jamali F, Paisán-Ruiz C. Molecular characterization of PRKN structural variations identified through whole-genome sequencing. Mol Genet Genomic Med 2018; 6:1243-1248. [PMID: 30328284 PMCID: PMC6305656 DOI: 10.1002/mgg3.482] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 08/21/2018] [Accepted: 08/29/2018] [Indexed: 12/16/2022] Open
Abstract
Background Early‐onset Parkinson's disease (PD) is the most common inherited form of parkinsonism, with the PRKN gene being the most frequently identified mutated. Exon rearrangements, identified in about 43.2% of the reported PD patients and with higher frequency in specific ethnicities, are the most prevalent PRKN mutations reported to date in PD patients. Methods In this study, three consanguineous families with early‐onset PD were subjected to whole‐genome sequencing (WGS) analyses that were followed by Sanger sequencing and droplet digital PCR to validate and confirm the disease segregation of the identified genomic variations and to determine their parental origin. Results Five different PRKN structural variations (SVs) were identified. Because the genomic sequences surrounding the break points of the identified SVs might hold important information about their genesis, these were also characterized for the presence of homology and repeated sequences. Conclusion We concluded that all identified PRKN SVs might originate through retrotransposition events.
Collapse
Affiliation(s)
- Paloma Bravo
- Department of Neurology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York
| | - Hossein Darvish
- Department of Medical Genetics, Semnan University of Medical Sciences, Semnan, Iran
| | - Abbas Tafakhori
- Department of Neurology, School of Medicine, Imam Khomeini Hospital and Iranian Center of Neurological Research, Tehran University of Medical Sciences, Tehran, Iran
| | - Luis J Azcona
- Department of Neurology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York.,Department of Neurosciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York
| | - Amir Hossein Johari
- Department of Medical Genetics, Semnan University of Medical Sciences, Semnan, Iran
| | - Faezeh Jamali
- Department of Medical Genetics, Semnan University of Medical Sciences, Semnan, Iran
| | - Coro Paisán-Ruiz
- Department of Neurology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York.,Department of Genetics and Genomic sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York.,The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York.,The Friedman Brain and Mindich Child Health and Development Institutes, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, New York
| |
Collapse
|
34
|
Analysis of intragenic USH2A copy number variation unveils broad spectrum of unique and recurrent variants. Eur J Med Genet 2018; 61:621-626. [DOI: 10.1016/j.ejmg.2018.04.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 04/09/2018] [Accepted: 04/11/2018] [Indexed: 11/21/2022]
|
35
|
The Genome of the Human Pathogen Candida albicans Is Shaped by Mutation and Cryptic Sexual Recombination. mBio 2018; 9:mBio.01205-18. [PMID: 30228236 PMCID: PMC6143739 DOI: 10.1128/mbio.01205-18] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The opportunistic fungal pathogen Candida albicans lacks a conventional sexual program and is thought to evolve, at least primarily, through the clonal acquisition of genetic changes. Here, we performed an analysis of heterozygous diploid genomes from 21 clinical isolates to determine the natural evolutionary processes acting on the C. albicans genome. Mutation and recombination shaped the genomic landscape among the C. albicans isolates. Strain-specific single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) clustered across the genome. Additionally, loss-of-heterozygosity (LOH) events contributed substantially to genotypic variation, with most long-tract LOH events extending to the ends of the chromosomes suggestive of repair via break-induced replication. Consistent with a model of inheritance by descent, most polymorphisms were shared between closely related strains. However, some isolates contained highly mosaic genomes consistent with strains having experienced interclade recombination during their evolutionary history. A detailed examination of mitochondrial genomes also revealed clear examples of interclade recombination among sequenced strains. These analyses therefore establish that both (para)sexual recombination and mitotic mutational processes drive evolution of this important pathogen. To further facilitate the study of C. albicans genomes, we also introduce an online platform, SNPMap, to examine SNP patterns in sequenced isolates.IMPORTANCE Mutations introduce variation into the genome upon which selection can act. Defining the nature of these changes is critical for determining species evolution, as well as for understanding the genetic changes driving important cellular processes. The heterozygous diploid fungus Candida albicans is both a frequent commensal organism and a prevalent opportunistic pathogen. A prevailing theory is that C. albicans evolves primarily through the gradual buildup of mitotic mutations, and a pressing issue is whether sexual or parasexual processes also operate within natural populations. Here, we establish that the C. albicans genome evolves by a combination of localized mutation and both short-tract and long-tract loss-of-heterozygosity (LOH) events within the sequenced isolates. Mutations are more prevalent within noncoding and heterozygous regions and LOH increases towards chromosome ends. Furthermore, we provide evidence for genetic exchange between isolates, establishing that sexual or parasexual processes have contributed to the diversity of both nuclear and mitochondrial genomes.
Collapse
|
36
|
Kehrer‐Sawatzki H, Kordes U, Seiffert S, Summerer A, Hagel C, Schüller U, Farschtschi S, Schneppenheim R, Bendszus M, Godel T, Mautner V. Co-occurrence of schwannomatosis and rhabdoid tumor predisposition syndrome 1. Mol Genet Genomic Med 2018; 6:627-637. [PMID: 29779243 PMCID: PMC6081224 DOI: 10.1002/mgg3.412] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 03/31/2018] [Accepted: 04/18/2018] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The clinical phenotype associated with germline SMARCB1 mutations has as yet not been fully documented. It is known that germline SMARCB1 mutations may cause rhabdoid tumor predisposition syndrome (RTPS1) or schwannomatosis. However, the co-occurrence of rhabdoid tumor and schwannomas in the same patient has not so far been reported. METHODS We investigated a family with members harboring a germline SMARCB1 deletion by means of whole-body MRI as well as high-resolution microstructural magnetic resonance neurography (MRN). Breakpoint-spanning PCRs were performed to characterize the SMARCB1 deletion and its segregation in the family. RESULTS The index patient of this family was in complete continuous remission for an atypical teratoid/rhabdoid tumor (AT/RT) treated at the age of 2 years. However, at the age of 21 years, she exhibited paraparesis of her legs and MRI investigations revealed multiple intrathoracic and spinal schwannomas. Breakpoint-spanning PCRs indicated that the germline deletion segregating in the family encompasses 6.4-kb and includes parts of SMARCB1 intron 7, exons 8-9 and 3.3-kb located telomeric to exon 9 including the SMARCB1 3' UTR. The analysis of sequences at the deletion breakpoints showed that the deletion has been caused by replication errors including template-switching. The patient had inherited the deletion from her 56-year-old healthy mother who did not exhibit schwannomas or other tumors as determined by whole-body MRI. However, MRN of the peripheral nerves of the mother's extremities revealed multiple fascicular microlesions which have been previously identified as indicative of schwannomatosis-associated subclinical peripheral nerve pathology. CONCLUSION The occurrence of schwannomatosis-associated clinical symptoms independent of the AT/RT as the primary disease should be considered in long-term survivors of AT/RT. Furthermore, our investigations indicate that germline SMARCB1 mutation carriers not presenting RTs or schwannomatosis-associated clinical symptoms may nevertheless exhibit peripheral nerve pathology as revealed by MRN.
Collapse
Affiliation(s)
| | - Uwe Kordes
- Department of Pediatric Hematology and OncologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
| | | | - Anna Summerer
- Institute of Human GeneticsUniversity of UlmUlmGermany
| | - Christian Hagel
- Institute of NeuropathologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
| | - Ulrich Schüller
- Department of Pediatric Hematology and OncologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
- Institute of NeuropathologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
- Research Institute Children's Cancer Center HamburgHamburgGermany
| | - Said Farschtschi
- Department of NeurologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
| | - Reinhard Schneppenheim
- Department of Pediatric Hematology and OncologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
| | - Martin Bendszus
- Department of NeuroradiologyUniversity of Heidelberg Medical CenterHeidelbergGermany
| | - Tim Godel
- Department of NeuroradiologyUniversity of Heidelberg Medical CenterHeidelbergGermany
| | - Victor‐Felix Mautner
- Department of NeurologyUniversity Medical Center Hamburg‐EppendorfHamburgGermany
| |
Collapse
|
37
|
Yang R, Fang S, Wang J, Zhang C, Zhang R, Liu D, Zhao Y, Hu X, Li N. Genome-wide analysis of structural variants reveals genetic differences in Chinese pigs. PLoS One 2017; 12:e0186721. [PMID: 29065176 PMCID: PMC5655481 DOI: 10.1371/journal.pone.0186721] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 10/08/2017] [Indexed: 11/19/2022] Open
Abstract
Pigs have experienced long-term selections, resulting in dramatic phenotypic changes. Structural variants (SVs) are reported to exert extensive impacts on phenotypic changes. We built a high resolution and informative SV map based on high-depth sequencing data from 66 Chinese domestic and wild pigs. We inferred the SV formation mechanisms in the pig genome and used SVs as materials to perform a population-level analysis. We detected the selection signals on chromosome X for northern Chinese domestic pigs, as well as the differentiated loci across the whole genome. Analysis showed that these loci differ between southern and northern Chinese domestic pigs. Our results based on SVs provide new insights into genetic differences in Chinese pigs.
Collapse
Affiliation(s)
- Ruifei Yang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, P. R. China
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
| | - Suyun Fang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
| | - Jing Wang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, P. R. China
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
| | - Chunyuan Zhang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, P. R. China
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
| | - Ran Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
| | - Di Liu
- Institute of Animal Industry, Heilongjiang Academy of Agricultural Sciences, Harbin, P. R. China
| | - Yiqiang Zhao
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing, P. R. China
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
- * E-mail: (XH); (YZ)
| | - Xiaoxiang Hu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
- National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, P. R. China
- * E-mail: (XH); (YZ)
| | - Ning Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, P. R. China
| |
Collapse
|
38
|
Ahn YJ, Markkandan K, Baek IP, Mun S, Lee W, Kim HS, Han K. An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data. Genes Genomics 2017; 40:39-47. [PMID: 29892897 DOI: 10.1007/s13258-017-0608-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 08/23/2017] [Indexed: 12/30/2022]
Abstract
Next generation sequencing (NGS) has traditionally been performed in various fields including agricultural to clinical and there are so many sequencing platforms available in order to obtain accurate and consistent results. However, these platforms showed amplification bias when facilitating variant calls in personal genomes. Here, we sequenced whole genomes and whole exomes from ten Korean individuals using Illumina and Ion Proton, respectively to find the vulnerability and accuracy of NGS platform in the GC rich/poor area. Overall, a total of 1013 Gb reads from Illumina and ~39.1 Gb reads from Ion Proton were analyzed using BWA-GATK variant calling pipeline. Furthermore, conjunction with the VQSR tool and detailed filtering strategies, we achieved high-quality variants. Finally, each of the ten variants from Illumina only, Ion Proton only, and intersection was selected for Sanger validation. The validation results revealed that Illumina platform showed higher accuracy than Ion Proton. The described filtering methods are advantageous for large population-based whole genome studies designed to identify common and rare variations associated with complex diseases.
Collapse
Affiliation(s)
- Yong Ju Ahn
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, Republic of Korea.,Theragen Etex Inc., Suwon, Republic of Korea
| | | | - In-Pyo Baek
- Theragen Etex Inc., Suwon, Republic of Korea
| | - Seyoung Mun
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, Republic of Korea.,DKU-Theragen Institute for NGS analysis (DTiNa), Cheonan, Republic of Korea
| | - Wooseok Lee
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, Republic of Korea.,DKU-Theragen Institute for NGS analysis (DTiNa), Cheonan, Republic of Korea
| | - Heui-Soo Kim
- Department of Biological Sciences, College of Natural Sciences, Pusan National University, Busan, Republic of Korea
| | - Kyudong Han
- Department of Nanobiomedical Science & BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, Republic of Korea. .,DKU-Theragen Institute for NGS analysis (DTiNa), Cheonan, Republic of Korea.
| |
Collapse
|
39
|
Zhang Y, Li S, Abyzov A, Gerstein MB. Landscape and variation of novel retroduplications in 26 human populations. PLoS Comput Biol 2017; 13:e1005567. [PMID: 28662076 PMCID: PMC5510864 DOI: 10.1371/journal.pcbi.1005567] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 07/14/2017] [Accepted: 05/12/2017] [Indexed: 01/10/2023] Open
Abstract
Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
Collapse
Affiliation(s)
- Yan Zhang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, United States of America
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
40
|
Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nat Rev Mol Cell Biol 2017; 18:495-506. [PMID: 28512351 DOI: 10.1038/nrm.2017.48] [Citation(s) in RCA: 997] [Impact Index Per Article: 142.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
DNA double-strand breaks (DSBs) are the most dangerous type of DNA damage because they can result in the loss of large chromosomal regions. In all mammalian cells, DSBs that occur throughout the cell cycle are repaired predominantly by the non-homologous DNA end joining (NHEJ) pathway. Defects in NHEJ result in sensitivity to ionizing radiation and the ablation of lymphocytes. The NHEJ pathway utilizes proteins that recognize, resect, polymerize and ligate the DNA ends in a flexible manner. This flexibility permits NHEJ to function on a wide range of DNA-end configurations, with the resulting repaired DNA junctions often containing mutations. In this Review, we discuss the most recent findings regarding the relative involvement of the different NHEJ proteins in the repair of various DNA-end configurations. We also discuss the shunting of DNA-end repair to the auxiliary pathways of alternative end joining (a-EJ) or single-strand annealing (SSA) and the relevance of these different pathways to human disease.
Collapse
|
41
|
Collins RL, Brand H, Redin CE, Hanscom C, Antolik C, Stone MR, Glessner JT, Mason T, Pregno G, Dorrani N, Mandrile G, Giachino D, Perrin D, Walsh C, Cipicchio M, Costello M, Stortchevoi A, An JY, Currall BB, Seabra CM, Ragavendran A, Margolin L, Martinez-Agosto JA, Lucente D, Levy B, Sanders SJ, Wapner RJ, Quintero-Rivera F, Kloosterman W, Talkowski ME. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol 2017; 18:36. [PMID: 28260531 PMCID: PMC5338099 DOI: 10.1186/s13059-017-1158-6] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 01/20/2017] [Indexed: 12/13/2022] Open
Abstract
Background Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. Results We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. Conclusions These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1158-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ryan L Collins
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, 02115, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Harrison Brand
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Claire E Redin
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Carrie Hanscom
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Caroline Antolik
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Matthew R Stone
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Joseph T Glessner
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Tamara Mason
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Giulia Pregno
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano, Italy
| | - Naghmeh Dorrani
- Department of Pathology & Laboratory Medicine and UCLA Clinical Genomics Center, David Geffen School of Medicine, University of California Los Angeles, UCLA, Los Angeles, CA, 90095, USA
| | - Giorgia Mandrile
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano, Italy
| | - Daniela Giachino
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano, Italy
| | - Danielle Perrin
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Cole Walsh
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Michelle Cipicchio
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Maura Costello
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Alexei Stortchevoi
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Joon-Yong An
- Department of Psychiatry, University of California San Francisco, San Francisco, CA, 94103, USA
| | - Benjamin B Currall
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Catarina M Seabra
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA.,GABBA Program, University of Porto, Porto, 4099-002, Portugal
| | - Ashok Ragavendran
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Lauren Margolin
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Julian A Martinez-Agosto
- Department of Pathology & Laboratory Medicine and UCLA Clinical Genomics Center, David Geffen School of Medicine, University of California Los Angeles, UCLA, Los Angeles, CA, 90095, USA
| | - Diane Lucente
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Brynn Levy
- Department of Pathology, Columbia University, New York, NY, 10032, USA
| | - Stephan J Sanders
- Department of Psychiatry, University of California San Francisco, San Francisco, CA, 94103, USA
| | - Ronald J Wapner
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Columbia University Medical Center, New York, NY, 10032, USA
| | - Fabiola Quintero-Rivera
- Department of Pathology & Laboratory Medicine and UCLA Clinical Genomics Center, David Geffen School of Medicine, University of California Los Angeles, UCLA, Los Angeles, CA, 90095, USA
| | - Wigard Kloosterman
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, 3584CG, The Netherlands
| | - Michael E Talkowski
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA. .,Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, 02115, USA. .,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
42
|
Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet 2017; 49:588-593. [PMID: 28250455 DOI: 10.1038/ng.3801] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 02/03/2017] [Indexed: 12/15/2022]
Abstract
Genomes usually contain some non-repetitive sequences that are missing from the reference genome and occur only in a population subset. Such non-repetitive, non-reference (NRNR) sequences have remained largely unexplored in terms of their characterization and downstream analyses. Here we describe 3,791 breakpoint-resolved NRNR sequence variants called using PopIns from whole-genome sequence data of 15,219 Icelanders. We found that over 95% of the 244 NRNR sequences that are 200 bp or longer are present in chimpanzees, indicating that they are ancestral. Furthermore, 149 variant loci are in linkage disequilibrium (r2 > 0.8) with a genome-wide association study (GWAS) catalog marker, suggesting disease relevance. Additionally, we report an association (P = 3.8 × 10-8, odds ratio (OR) = 0.92) with myocardial infarction (23,360 cases, 300,771 controls) for a 766-bp NRNR sequence variant. Our results underline the importance of including variation of all complexity levels when searching for variants that associate with disease.
Collapse
|
43
|
Nilsson D, Pettersson M, Gustavsson P, Förster A, Hofmeister W, Wincent J, Zachariadis V, Anderlid BM, Nordgren A, Mäkitie O, Wirta V, Käller M, Vezzi F, Lupski JR, Nordenskjöld M, Lundberg ES, Carvalho CMB, Lindstrand A. Whole-Genome Sequencing of Cytogenetically Balanced Chromosome Translocations Identifies Potentially Pathological Gene Disruptions and Highlights the Importance of Microhomology in the Mechanism of Formation. Hum Mutat 2017; 38:180-192. [PMID: 27862604 PMCID: PMC5225243 DOI: 10.1002/humu.23146] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/01/2016] [Indexed: 11/07/2022]
Abstract
Most balanced translocations are thought to result mechanistically from nonhomologous end joining or, in rare cases of recurrent events, by nonallelic homologous recombination. Here, we use low-coverage mate pair whole-genome sequencing to fine map rearrangement breakpoint junctions in both phenotypically normal and affected translocation carriers. In total, 46 junctions from 22 carriers of balanced translocations were characterized. Genes were disrupted in 48% of the breakpoints; recessive genes in four normal carriers and known dominant intellectual disability genes in three affected carriers. Finally, seven candidate disease genes were disrupted in five carriers with neurocognitive disabilities (SVOPL, SUSD1, TOX, NCALD, SLC4A10) and one XX-male carrier with Tourette syndrome (LYPD6, GPC5). Breakpoint junction analyses revealed microhomology and small templated insertions in a substantive fraction of the analyzed translocations (17.4%; n = 4); an observation that was substantiated by reanalysis of 37 previously published translocation junctions. Microhomology associated with templated insertions is a characteristic seen in the breakpoint junctions of rearrangements mediated by error-prone replication-based repair mechanisms. Our data implicate that a mechanism involving template switching might contribute to the formation of at least 15% of the interchromosomal translocation events.
Collapse
Affiliation(s)
- Daniel Nilsson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
- Science for Life Laboratory, Karolinska Institutet Science Park, 171 21 Solna, Sweden
| | - Maria Pettersson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Peter Gustavsson
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Alisa Förster
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Wolfgang Hofmeister
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Josephine Wincent
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Vasilios Zachariadis
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Britt-Marie Anderlid
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Ann Nordgren
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Outi Mäkitie
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
- Children's Hospital, Helsinki University Central Hospital and University of Helsinki, 00290 Helsinki, Finland
- Folkhälsan Institute of Genetics, 00290 Helsinki, Finland
| | - Valtteri Wirta
- SciLifeLab, School of Biotechnology, KTH Royal Institute of Technology, 171 71 Stockholm, Sweden
| | - Max Käller
- SciLifeLab, School of Biotechnology, KTH Royal Institute of Technology, 171 71 Stockholm, Sweden
| | - Francesco Vezzi
- SciLifeLab, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Stockholm, Sweden
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, 77030 Houston TX, USA
- Texas Children’s Hospital, 77030 Houston TX, USA
| | - Magnus Nordenskjöld
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Elisabeth Syk Lundberg
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Claudia M. B. Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, 77030 Houston TX, USA
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden
- Center for Molecular Medicine, Karolinska Institutet, 171 76 Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| |
Collapse
|
44
|
|
45
|
Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016; 239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]
Abstract
Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping.
Collapse
Affiliation(s)
- Chanson J Brumme
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, London, Ontario, Canada.
| |
Collapse
|
46
|
Poot M. Retrotransposing Gremlins May Disrupt Our Brain's Genomes. Mol Syndromol 2016; 8:55-57. [PMID: 28611545 DOI: 10.1159/000453247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2016] [Indexed: 11/19/2022] Open
|
47
|
Poot M. Discovering Patterns of Structural Variation by Mining Molecular Fossils. Mol Syndromol 2016; 7:299-301. [PMID: 27920632 DOI: 10.1159/000450807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2016] [Indexed: 11/19/2022] Open
|
48
|
The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies. Nat Genet 2016; 49:36-45. [PMID: 27841880 DOI: 10.1038/ng.3720] [Citation(s) in RCA: 194] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 10/17/2016] [Indexed: 12/16/2022]
Abstract
Despite the clinical significance of balanced chromosomal abnormalities (BCAs), their characterization has largely been restricted to cytogenetic resolution. We explored the landscape of BCAs at nucleotide resolution in 273 subjects with a spectrum of congenital anomalies. Whole-genome sequencing revised 93% of karyotypes and demonstrated complexity that was cryptic to karyotyping in 21% of BCAs, highlighting the limitations of conventional cytogenetic approaches. At least 33.9% of BCAs resulted in gene disruption that likely contributed to the developmental phenotype, 5.2% were associated with pathogenic genomic imbalances, and 7.3% disrupted topologically associated domains (TADs) encompassing known syndromic loci. Remarkably, BCA breakpoints in eight subjects altered a single TAD encompassing MEF2C, a known driver of 5q14.3 microdeletion syndrome, resulting in decreased MEF2C expression. We propose that sequence-level resolution dramatically improves prediction of clinical outcomes for balanced rearrangements and provides insight into new pathogenic mechanisms, such as altered regulation due to changes in chromosome topology.
Collapse
|
49
|
Nguyen HT, Boocock J, Merriman TR, Black MA. SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions. Front Genet 2016; 7:160. [PMID: 27695476 PMCID: PMC5023681 DOI: 10.3389/fgene.2016.00160] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Accepted: 08/24/2016] [Indexed: 12/28/2022] Open
Abstract
Copy-number variation (CNV) has been associated with increased risk of complex diseases. High-throughput sequencing (HTS) technologies facilitate the detection of copy-number variable regions (CNVRs) and their breakpoints. This helps in understanding genome structure as well as their evolution process. Various approaches have been proposed for detecting CNV breakpoints, but currently it is still challenging for tools based on a single analysis method to identify breakpoints of CNVs. It has been shown, however, that pipelines which integrate multiple approaches are able to report more reliable breakpoints. Here, based on HTS data, we have developed a pipeline to identify approximate breakpoints (±10 bp) relating to different ancestral events within a specific CNVR. The pipeline combines read-depth and split-read information to infer breakpoints, using information from multiple samples to allow an imputation approach to be taken. The main steps involve using a normal mixture model to cluster samples into different groups, followed by simple kernel-based approaches to maximize information obtained from read-depth and split-read approaches, after which common breakpoints of groups are inferred. The pipeline uses split-read information directly from CIGAR strings of BAM files, without using a re-alignment step. On simulated data sets, it was able to report breakpoints for very low-coverage samples including those for which only single-end reads were available. When applied to three loci from existing human resequencing data sets (NEGR1, LCE3, IRGM) the pipeline obtained good concordance with results from the 1000 Genomes Project (92, 100, and 82%, respectively). The package is available at https://github.com/hoangtn/SRBreak, and also as a docker-based application at https://registry.hub.docker.com/u/hoangtn/srbreak/.
Collapse
Affiliation(s)
- Hoang T Nguyen
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand; Department of Psychiatry, Mount Sinai School of Medicine, New YorkNY, USA; Department of Mathematics, Cao Thang College of TechnologyHo Chi Minh City, Vietnam
| | - James Boocock
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand; Department of Psychiatry, Mount Sinai School of Medicine, New YorkNY, USA
| | - Tony R Merriman
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand
| | - Michael A Black
- Department of Biochemistry, University of OtagoDunedin, New Zealand; Virtual Institute of Statistical GeneticsDunedin, New Zealand
| |
Collapse
|
50
|
Van Cauwenbergh C, Van Schil K, Cannoodt R, Bauwens M, Van Laethem T, De Jaegere S, Steyaert W, Sante T, Menten B, Leroy BP, Coppieters F, De Baere E. arrEYE: a customized platform for high-resolution copy number analysis of coding and noncoding regions of known and candidate retinal dystrophy genes and retinal noncoding RNAs. Genet Med 2016; 19:457-466. [PMID: 27608171 PMCID: PMC5392597 DOI: 10.1038/gim.2016.119] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 07/06/2016] [Indexed: 12/13/2022] Open
Abstract
Purpose: Our goal was to design a customized microarray, arrEYE, for high-resolution copy number variant (CNV) analysis of known and candidate genes for inherited retinal dystrophy (iRD) and retina-expressed noncoding RNAs (ncRNAs). Methods: arrEYE contains probes for the full genomic region of 106 known iRD genes, including those implicated in retinitis pigmentosa (RP) (the most frequent iRD), cone–rod dystrophies, macular dystrophies, and an additional 60 candidate iRD genes and 196 ncRNAs. Eight CNVs in iRD genes identified by other techniques were used as positive controls. The test cohort consisted of 57 patients with autosomal dominant, X-linked, or simplex RP. Results: In an RP patient, a novel heterozygous deletion of exons 7 and 8 of the HGSNAT gene was identified: c.634-408_820+338delinsAGAATATG, p.(Glu212Glyfs*2). A known variant was found on the second allele: c.1843G>A, p.(Ala615Thr). Furthermore, we expanded the allelic spectrum of USH2A and RCBTB1 with novel CNVs. Conclusion: The arrEYE platform revealed subtle single-exon to larger CNVs in iRD genes that could be characterized at the nucleotide level, facilitated by the high resolution of the platform. We report the first CNV in HGSNAT that, combined with another mutation, leads to RP, further supporting its recently identified role in nonsyndromic iRD. Genet Med19 4, 457–466.
Collapse
Affiliation(s)
- Caroline Van Cauwenbergh
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Kristof Van Schil
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Robrecht Cannoodt
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium.,Data Mining and Modeling for Biomedicine group, VIB Inflammation Research Center, Ghent, Belgium.,Department of Internal Medicine, Ghent University, Ghent, Belgium
| | - Miriam Bauwens
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Thalia Van Laethem
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Sarah De Jaegere
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Wouter Steyaert
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Tom Sante
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Björn Menten
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Bart P Leroy
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium.,Department of Ophthalmology, Ghent University and Ghent University Hospital, Ghent, Belgium.,Division of Ophthalmology and Center for Cellular & Molecular Therapeutics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Frauke Coppieters
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Elfride De Baere
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital, Ghent, Belgium
| |
Collapse
|