1
|
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Rocha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O'Neill RJ, Koren S, Makova KD, Phillippy AM, Eichler EE. Complete sequencing of ape genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605654. [PMID: 39131277 PMCID: PMC11312596 DOI: 10.1101/2024.07.31.605654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
Collapse
Affiliation(s)
- DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19103, USA
| | - Steven J Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dmitry Antipov
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Francesco Montinaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yanting Luo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Joanna Malukiewicz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Jessica M Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Abigail N Sequeira
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Riley J Mangan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Genetics Training Program, Harvard Medical School, Boston, MA 02115, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | | | | | - Anton Bankevich
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Christine R Beck
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Matthew Borchers
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Emry Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shelise Y Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lucia Carbone
- Department of Medicine, KCVI, Oregon Health Sciences University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
| | - Laura Carrel
- PSU Medical School, Penn State University School of Medicine, Hershey, PA, USA
| | - Agnes P Chan
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Juyun Crawford
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Cedric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10021, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Luciana de Gennaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - David Gilbert
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Ishaan Gupta
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Junmin Han
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Gabrielle A Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Research Institute, Goethe University, Frankfurt, Germany
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marlys L Houck
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Chul Lee
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Youngho Lee
- Laboratory of bioinformatics and population genetics, Interdisciplinary program in bioinformatics, Seoul National University, Republic of Korea
| | - William Lees
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Yong Hwee Eddie Loh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Hailey Loucks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China
- Shanghai Jiao Tong University Chongqing Research Institute, Chongqing, China
| | - Juan F I Martinez
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Barbara McGrath
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Britta S Meyer
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Saswat K Mohanty
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karol Pal
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Francisca R Ringeling
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
| | - Oliver A Ryder
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Swati Saha
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Nicholas J Schork
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Cole Shanks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Dongmin R Son
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cynthia Steiner
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Alexander P Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael G Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Mihir Trivedi
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Wenjie Wei
- School of Life Sciences, Westlake University, Hangzhou 310024, China
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, 430070, Wuhan, China
| | - Julie Wertz
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Panpan Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Zhenmiao Zhang
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Sarah A Zhao
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yixin Zhu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Zachary A Szpiech
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Christian D Huber
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Tobias L Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Soojin V Yi
- Department of Ecology, Evolution and Marine Biology, Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Craig B Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- Departments of Molecular and Cell Biology, UConn Storrs, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
2
|
Sund KL, Liu J, Lee J, Garbe J, Abdelhamed Z, Maag C, Hallinan B, Wu SW, Sperry E, Deshpande A, Stottmann R, Smolarek TA, Dyer LM, Hestand MS. Long-read sequencing and optical genome mapping identify causative gene disruptions in noncoding sequence in two patients with neurologic disease and known chromosome abnormalities. Am J Med Genet A 2024:e63818. [PMID: 39041659 DOI: 10.1002/ajmg.a.63818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/12/2024] [Accepted: 07/07/2024] [Indexed: 07/24/2024]
Abstract
Despite advances in next generation sequencing (NGS), genetic diagnoses remain elusive for many patients with neurologic syndromes. Long-read sequencing (LRS) and optical genome mapping (OGM) technologies improve upon existing capabilities in the detection and interpretation of structural variation in repetitive DNA, on a single haplotype, while also providing enhanced breakpoint resolution. We performed LRS and OGM on two patients with known chromosomal rearrangements and inconclusive Sanger or NGS. The first patient, who had epilepsy and developmental delay, had a complex translocation between two chromosomes that included insertion and inversion events. The second patient, who had a movement disorder, had an inversion on a single chromosome disrupted by multiple smaller inversions and insertions. Sequence level resolution of the rearrangements identified pathogenic breaks in noncoding sequence in or near known disease-causing genes with relevant neurologic phenotypes (MBD5, NKX2-1). These specific variants have not been reported previously, but expected molecular consequences are consistent with previously reported cases. As the use of LRS and OGM technologies for clinical testing increases and data analyses become more standardized, these methods along with multiomic data to validate noncoding variation effects will improve diagnostic yield and increase the proportion of probands with detectable pathogenic variants for known genes implicated in neurogenetic disease.
Collapse
Affiliation(s)
- Kristen L Sund
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Jie Liu
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, California, USA
| | - John Garbe
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, Minnesota, USA
| | - Zakia Abdelhamed
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Chelsey Maag
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Barbara Hallinan
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Steven W Wu
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Ethan Sperry
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Archana Deshpande
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, Minnesota, USA
| | - Rolf Stottmann
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Teresa A Smolarek
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Lisa M Dyer
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| | - Matthew S Hestand
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio, USA
| |
Collapse
|
3
|
Guitart X, Porubsky D, Yoo D, Dougherty ML, Dishuck PC, Munson KM, Lewis AP, Hoekzema K, Knuth J, Chang S, Pastinen T, Eichler EE. Independent expansion, selection and hypervariability of the TBC1D3 gene family in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584650. [PMID: 38654825 PMCID: PMC11037872 DOI: 10.1101/2024.03.12.584650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on chromosome 17. We find that most humans vary along two TBC1D3 clusters where human haplotypes are highly variable in copy number, differing by as many as 20 copies, and structure (structural heterozygosity 90%). We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Lastly, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL. These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.
Collapse
Affiliation(s)
- Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Max L. Dougherty
- Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Stephen Chang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
| | - Tomi Pastinen
- Department of Pediatrics, Genomic Medicine Center, Children’s Mercy Kansas City, Kansas City, MO, USA
- Department of Pediatrics, School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
4
|
Paparella A, L’Abbate A, Palmisano D, Chirico G, Porubsky D, Catacchio CR, Ventura M, Eichler EE, Maggiolini FAM, Antonacci F. Structural Variation Evolution at the 15q11-q13 Disease-Associated Locus. Int J Mol Sci 2023; 24:15818. [PMID: 37958807 PMCID: PMC10648317 DOI: 10.3390/ijms242115818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
The impact of segmental duplications on human evolution and disease is only just starting to unfold, thanks to advancements in sequencing technologies that allow for their discovery and precise genotyping. The 15q11-q13 locus is a hotspot of recurrent copy number variation associated with Prader-Willi/Angelman syndromes, developmental delay, autism, and epilepsy and is mediated by complex segmental duplications, many of which arose recently during evolution. To gain insight into the instability of this region, we characterized its architecture in human and nonhuman primates, reconstructing the evolutionary history of five different inversions that rearranged the region in different species primarily by accumulation of segmental duplications. Comparative analysis of human and nonhuman primate duplication structures suggests a human-specific gain of directly oriented duplications in the regions flanking the GOLGA cores and HERC segmental duplications, representing potential genomic drivers for the human-specific expansions. The increasing complexity of segmental duplication organization over the course of evolution underlies its association with human susceptibility to recurrent disease-associated rearrangements.
Collapse
Affiliation(s)
- Annalisa Paparella
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Alberto L’Abbate
- Institute of Biomembranes, Bioenergetics, and Molecular Biotechnology (IBIOM), 70125 Bari, Italy
| | - Donato Palmisano
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Gerardina Chirico
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Claudia R. Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute (HHMI), University of Washington, Seattle, WA 98195, USA
| | - Flavia A. M. Maggiolini
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
- Research Centre for Viticulture and Enology, Council for Agricultural Research and Economics (CREA), 70010 Bari, Italy
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| |
Collapse
|
5
|
Gilmore RB, Gorka D, Stoddard CE, Cotney JL, Chamberlain SJ. Generation of isogenic models of Angelman syndrome and Prader-Willi syndrome in CRISPR/Cas9-engineered human embryonic stem cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.30.555563. [PMID: 37693591 PMCID: PMC10491257 DOI: 10.1101/2023.08.30.555563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Angelman Syndrome (AS) and Prader-Willi Syndrome (PWS), two distinct neurodevelopmental disorders, result from loss of expression from imprinted genes in the chromosome 15q11-13 locus most commonly caused by a megabase-scale deletion on either the maternal or paternal allele, respectively. Each occurs at an approximate incidence of 1/15,000 to 1/30,000 live births and has a range of debilitating phenotypes. Patient-derived induced pluripotent stem cells (iPSCs) have been valuable tools to understand human-relevant gene regulation at this locus and have contributed to the development of therapeutic approaches for AS. Nonetheless, gaps remain in our understanding of how these deletions contribute to dysregulation and phenotypes of AS and PWS. Variability across cell lines due to donor differences, reprogramming methods, and genetic background make it challenging to fill these gaps in knowledge without substantially increasing the number of cell lines used in the analyses. Isogenic cell lines that differ only by the genetic mutation causing the disease can ease this burden without requiring such a large number of cell lines. Here, we describe the development of isogenic human embryonic stem cell (hESC) lines modeling the most common genetic subtypes of AS and PWS. These lines allow for a facile interrogation of allele-specific gene regulation at the chromosome 15q11-q13 locus. Additionally, these lines are an important resource to identify and test targeted therapeutic approaches for patients with AS and PWS.
Collapse
Affiliation(s)
- Rachel B Gilmore
- Department of Genetics and Genome Sciences, UConn Health; Farmington, CT, USA
| | - Dea Gorka
- Department of Genetics and Genome Sciences, UConn Health; Farmington, CT, USA
| | | | - Justin L Cotney
- Department of Genetics and Genome Sciences, UConn Health; Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Stormy J Chamberlain
- Department of Genetics and Genome Sciences, UConn Health; Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
6
|
Porubsky D, Harvey WT, Rozanski AN, Ebler J, Höps W, Ashraf H, Hasenfeld P, Paten B, Sanders AD, Marschall T, Korbel JO, Eichler EE. Inversion polymorphism in a complete human genome assembly. Genome Biol 2023; 24:100. [PMID: 37122002 PMCID: PMC10150506 DOI: 10.1186/s13059-023-02919-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 03/31/2023] [Indexed: 05/02/2023] Open
Abstract
The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Helmholtz Association, 10115, Berlin, Germany
- Berlin Institute of Health (BIH), 10178, Berlin, Germany
- Charité-Universitätsmedizin, 10117, Berlin, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
7
|
Porubsky D, Höps W, Ashraf H, Hsieh P, Rodriguez-Martin B, Yilmaz F, Ebler J, Hallast P, Maria Maggiolini FA, Harvey WT, Henning B, Audano PA, Gordon DS, Ebert P, Hasenfeld P, Benito E, Zhu Q, Lee C, Antonacci F, Steinrücken M, Beck CR, Sanders AD, Marschall T, Eichler EE, Korbel JO. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 2022; 185:1986-2005.e26. [PMID: 35525246 PMCID: PMC9563103 DOI: 10.1016/j.cell.2022.04.017] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/14/2022] [Accepted: 04/08/2022] [Indexed: 12/13/2022]
Abstract
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Flavia Angela Maria Maggiolini
- Department of Biology, University of Bari "Aldo Moro", 70125 Bari, Italy; Consiglio per la Ricerca in Agricoltura e l'Analisi dell'Economia Agraria-Centro di Ricerca Viticoltura ed Enologia (CREA-VE), Via Casamassima 148, 70010 Turi, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Barbara Henning
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Eva Benito
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | | | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA; The University of Connecticut Health Center, 400 Farmington Rd., Farmington, CT 06032, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany; Berlin Institute of Health (BIH), Berlin, Germany; Charité-Universitätsmedizin, Berlin, Berlin, Germany
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
8
|
Gershman A, Sauria MEG, Guitart X, Vollger MR, Hook PW, Hoyt SJ, Jain M, Shumate A, Razaghi R, Koren S, Altemose N, Caldas GV, Logsdon GA, Rhie A, Eichler EE, Schatz MC, O'Neill RJ, Phillippy AM, Miga KH, Timp W. Epigenetic patterns in a complete human genome. Science 2022; 376:eabj5089. [PMID: 35357915 PMCID: PMC9170183 DOI: 10.1126/science.abj5089] [Citation(s) in RCA: 116] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.
Collapse
Affiliation(s)
- Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Michael E G Sauria
- Department of Biology and Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Savannah J Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Roham Razaghi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA
| | - Gina V Caldas
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley CA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Michael C Schatz
- Department of Biology and Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
9
|
Mostovoy Y, Yilmaz F, Chow SK, Chu C, Lin C, Geiger EA, Meeks NJL, Chatfield KC, Coughlin CR, Surti U, Kwok PY, Shaikh TH. Genomic regions associated with microdeletion/microduplication syndromes exhibit extreme diversity of structural variation. Genetics 2021; 217:6066166. [PMID: 33724415 DOI: 10.1093/genetics/iyaa038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 12/18/2020] [Indexed: 11/12/2022] Open
Abstract
Segmental duplications (SDs) are a class of long, repetitive DNA elements whose paralogs share a high level of sequence similarity with each other. SDs mediate chromosomal rearrangements that lead to structural variation in the general population as well as genomic disorders associated with multiple congenital anomalies, including the 7q11.23 (Williams-Beuren Syndrome, WBS), 15q13.3, and 16p12.2 microdeletion syndromes. Population-level characterization of SDs has generally been lacking because most techniques used for analyzing these complex regions are both labor and cost intensive. In this study, we have used a high-throughput technique to genotype complex structural variation with a single molecule, long-range optical mapping approach. We characterized SDs and identified novel structural variants (SVs) at 7q11.23, 15q13.3, and 16p12.2 using optical mapping data from 154 phenotypically normal individuals from 26 populations comprising five super-populations. We detected several novel SVs for each locus, some of which had significantly different prevalence between populations. Additionally, we localized the microdeletion breakpoints to specific paralogous duplicons located within complex SDs in two patients with WBS, one patient with 15q13.3, and one patient with 16p12.2 microdeletion syndromes. The population-level data presented here highlights the extreme diversity of large and complex SVs within SD-containing regions. The approach we outline will greatly facilitate the investigation of the role of inter-SD structural variation as a driver of chromosomal rearrangements and genomic disorders.
Collapse
Affiliation(s)
- Yulia Mostovoy
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Feyza Yilmaz
- Department of Integrative Biology, University of Colorado Denver, Denver, CO 80204, USA.,Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Stephen K Chow
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Catherine Chu
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Chin Lin
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Elizabeth A Geiger
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Naomi J L Meeks
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Kathryn C Chatfield
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA.,Department of Pediatrics, Section of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Curtis R Coughlin
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Pui-Yan Kwok
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA.,Department of Dermatology, UCSF School of Medicine, San Francisco, CA 94143, USA.,Institute for Human Genetics, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Tamim H Shaikh
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
10
|
Velandia-Huerto CA, Fallmann J, Stadler PF. miRNAture-Computational Detection of microRNA Candidates. Genes (Basel) 2021; 12:348. [PMID: 33673400 PMCID: PMC7996739 DOI: 10.3390/genes12030348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/19/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022] Open
Abstract
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, which can be detected only by detailed analysis of specific features typical for a RNA family and/or the analysis of conservation patterns in structure-annotated multiple sequence alignments. The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. The miRNAture pipeline yields very good results for a large number of "typical" miRBase families. However, it also highlights difficulties with atypical cases, in particular microRNAs deriving from repetitive elements and microRNAs with unusual, branched precursor structures and atypical locations of the mature product, which require specific curation by domain experts.
Collapse
Affiliation(s)
- Cristian A. Velandia-Huerto
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, CO-111321 Bogotá, Colombia
- Santa Fe Insitute, Santa Fe, NM 87501, USA
| |
Collapse
|
11
|
Single-cell strand sequencing of a macaque genome reveals multiple nested inversions and breakpoint reuse during primate evolution. Genome Res 2020; 30:1680-1693. [PMID: 33093070 PMCID: PMC7605249 DOI: 10.1101/gr.265322.120] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/02/2020] [Indexed: 12/14/2022]
Abstract
Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.
Collapse
|
12
|
Porubsky D, Sanders AD, Höps W, Hsieh P, Sulovari A, Li R, Mercuri L, Sorensen M, Murali SC, Gordon D, Cantsilieris S, Pollen AA, Ventura M, Antonacci F, Marschall T, Korbel JO, Eichler EE. Recurrent inversion toggling and great ape genome evolution. Nat Genet 2020; 52:849-858. [PMID: 32541924 PMCID: PMC7415573 DOI: 10.1038/s41588-020-0646-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 05/15/2020] [Indexed: 01/14/2023]
Abstract
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ludovica Mercuri
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Shwetha C Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Centre for Eye Research Australia, Department of Surgery (Ophthalmology), University of Melbourne, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
| | - Alex A Pollen
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Mario Ventura
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Francesca Antonacci
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
13
|
Evolutionary Dynamics of the POTE Gene Family in Human and Nonhuman Primates. Genes (Basel) 2020; 11:genes11020213. [PMID: 32085667 PMCID: PMC7073761 DOI: 10.3390/genes11020213] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 02/06/2020] [Accepted: 02/13/2020] [Indexed: 12/20/2022] Open
Abstract
POTE (prostate, ovary, testis, and placenta expressed) genes belong to a primate-specific gene family expressed in prostate, ovary, and testis as well as in several cancers including breast, prostate, and lung cancers. Due to their tumor-specific expression, POTEs are potential oncogenes, therapeutic targets, and biomarkers for these malignancies. This gene family maps within human and primate segmental duplications with a copy number ranging from two to 14 in different species. Due to the high sequence identity among the gene copies, specific efforts are needed to assemble these loci in order to correctly define the organization and evolution of the gene family. Using single-molecule, real-time (SMRT) sequencing, in silico analyses, and molecular cytogenetics, we characterized the structure, copy number, and chromosomal distribution of the POTE genes, as well as their expression in normal and disease tissues, and provided a comparative analysis of the POTE organization and gene structure in primate genomes. We were able, for the first time, to de novo sequence and assemble a POTE tandem duplication in marmoset that is misassembled and collapsed in the reference genome, thus revealing the presence of a second POTE copy. Taken together, our findings provide comprehensive insights into the evolutionary dynamics of the primate-specific POTE gene family, involving gene duplications, deletions, and long interspersed nuclear element (LINE) transpositions to explain the actual repertoire of these genes in human and primate genomes.
Collapse
|
14
|
Bao S, Zhao H, Yuan J, Fan D, Zhang Z, Su J, Zhou M. Computational identification of mutator-derived lncRNA signatures of genome instability for improving the clinical outcome of cancers: a case study in breast cancer. Brief Bioinform 2019; 21:1742-1755. [PMID: 31665214 DOI: 10.1093/bib/bbz118] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 07/29/2019] [Accepted: 08/12/2019] [Indexed: 12/24/2022] Open
Abstract
Emerging evidence revealed the critical roles of long non-coding RNAs (lncRNAs) in maintaining genomic instability. However, identification of genome instability-associated lncRNAs and their clinical significance in cancers remain largely unexplored. Here, we developed a mutator hypothesis-derived computational frame combining lncRNA expression profiles and somatic mutation profiles in a tumor genome and identified 128 novel genomic instability-associated lncRNAs in breast cancer as a case study. We then identified a genome instability-derived two lncRNA-based gene signature (GILncSig) that stratified patients into high- and low-risk groups with significantly different outcome and was further validated in multiple independent patient cohorts. Furthermore, the GILncSig correlated with genomic mutation rate in both ovarian cancer and breast cancer, indicating its potential as a measurement of the degree of genome instability. The GILncSig was able to divide TP53 wide-type patients into two risk groups, with the low-risk group showing significantly improved outcome and the high-risk group showing no significant difference compared with those with TP53 mutation. In summary, this study provided a critical approach and resource for further studies examining the role of lncRNAs in genome instability and introduced a potential new avenue for identifying genomic instability-associated cancer biomarkers.
Collapse
Affiliation(s)
- Siqi Bao
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Hengqiang Zhao
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jian Yuan
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Dandan Fan
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jianzhong Su
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Meng Zhou
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, P. R. China
| |
Collapse
|