1
|
Soto DC, Uribe-Salazar JM, Kaya G, Valdarrago R, Sekar A, Haghani NK, Hino K, La GN, Mariano NAF, Ingamells C, Baraban AE, Turner TN, Green ED, Simó S, Quon G, Andrés AM, Dennis MY. Gene expansions contributing to human brain evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.26.615256. [PMID: 39386494 PMCID: PMC11463660 DOI: 10.1101/2024.09.26.615256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Genomic drivers of human-specific neurological traits remain largely undiscovered. Duplicated genes expanded uniquely in the human lineage likely contributed to brain evolution, including the increased complexity of synaptic connections between neurons and the dramatic expansion of the neocortex. Discovering duplicate genes is challenging because the similarity of paralogs makes them prone to sequence-assembly errors. To mitigate this issue, we analyzed a complete telomere-to-telomere human genome sequence (T2T-CHM13) and identified 213 duplicated gene families likely containing human-specific paralogs (>98% identity). Positing that genes important in universal human brain features should exist with at least one copy in all modern humans and exhibit expression in the brain, we narrowed in on 362 paralogs with at least one copy across thousands of ancestrally diverse genomes and present in human brain transcriptomes. Of these, 38 paralogs co-express in gene modules enriched for autism-associated genes and potentially contribute to human language and cognition. We narrowed in on 13 duplicate gene families with human-specific paralogs that are fixed among modern humans and show convincing brain expression patterns. Using long-read DNA sequencing revealed hidden variation across 200 modern humans of diverse ancestries, uncovering signatures of selection not previously identified, including possible balancing selection of CD8B. To understand the roles of duplicated genes in brain development, we generated zebrafish CRISPR "knockout" models of nine orthologs and transiently introduced mRNA-encoding paralogs, effectively "humanizing" the larvae. Morphometric, behavioral, and single-cell RNA-seq screening highlighted, for the first time, a possible role for GPR89B in dosage-mediated brain expansion and FRMPD2B function in altered synaptic signaling, both hallmark features of the human brain. Our holistic approach provides important insights into human brain evolution as well as a resource to the community for studying additional gene expansion drivers of human brain evolution.
Collapse
Affiliation(s)
- Daniela C. Soto
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - José M. Uribe-Salazar
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Gulhan Kaya
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Ricardo Valdarrago
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
| | - Aarthi Sekar
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Nicholas K. Haghani
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Keiko Hino
- Department of Cell Biology & Human Anatomy, University of California, Davis, CA 95616, USA
| | - Gabriana N. La
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Natasha Ann F. Mariano
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
- Postbaccalaureate Research Education Program, University of California, Davis, CA 95616, USA
| | - Cole Ingamells
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Aidan E. Baraban
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St Louis, MS, 63110, USA
| | - Eric D. Green
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD,20892, USA
| | - Sergi Simó
- Department of Cell Biology & Human Anatomy, University of California, Davis, CA 95616, USA
| | - Gerald Quon
- Genome Center, University of California, Davis, CA 95616, USA
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
| | - Aida M. Andrés
- UCL Genetics Institute, Department of Genetics, Evolution and Environment, University College, London, WC1E 6BT, UK
| | - Megan Y. Dennis
- Department of Biochemistry & Molecular Medicine, MIND Institute, University of California,Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| |
Collapse
|
2
|
Guitart X, Porubsky D, Yoo D, Dougherty ML, Dishuck PC, Munson KM, Lewis AP, Hoekzema K, Knuth J, Chang S, Pastinen T, Eichler EE. Independent expansion, selection and hypervariability of the TBC1D3 gene family in humans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.12.584650. [PMID: 38654825 PMCID: PMC11037872 DOI: 10.1101/2024.03.12.584650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on chromosome 17. We find that most humans vary along two TBC1D3 clusters where human haplotypes are highly variable in copy number, differing by as many as 20 copies, and structure (structural heterozygosity 90%). We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Lastly, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL. These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.
Collapse
Affiliation(s)
- Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Max L. Dougherty
- Tisch Cancer Institute, Division of Hematology and Medical Oncology, The Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Stephen Chang
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
| | - Tomi Pastinen
- Department of Pediatrics, Genomic Medicine Center, Children’s Mercy Kansas City, Kansas City, MO, USA
- Department of Pediatrics, School of Medicine, University of Missouri Kansas City, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, Eichler EE, Phillippy AM, Koren S. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol 2023; 41:1474-1482. [PMID: 36797493 PMCID: PMC10427740 DOI: 10.1038/s41587-023-01662-6] [Citation(s) in RCA: 109] [Impact Index Per Article: 109.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 01/03/2023] [Indexed: 02/18/2023]
Abstract
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
4
|
Wang H, Makowski C, Zhang Y, Qi A, Kaufmann T, Smeland OB, Fiecas M, Yang J, Visscher PM, Chen CH. Chromosomal inversion polymorphisms shape human brain morphology. Cell Rep 2023; 42:112896. [PMID: 37505983 PMCID: PMC10508191 DOI: 10.1016/j.celrep.2023.112896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/27/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
The impact of chromosomal inversions on human brain morphology remains underexplored. We studied 35 common inversions classified from genotypes of 33,018 adults with European ancestry. The inversions at 2p22.3, 16p11.2, and 17q21.31 reach genome-wide significance, followed by 8p23.1 and 6p21.33, in their association with cortical and subcortical morphology. The 17q21.31, 8p23.1, and 16p11.2 regions comprise the LRRC37, OR7E, and NPIP duplicated gene families. We find the 17q21.31 MAPT inversion region, known for harboring neurological risk, to be the most salient locus among common variants for shaping and patterning the cortex. Overall, we observe the inverted orientations decreasing brain size, with the exception that the 2p22.3 inversion is associated with increased subcortical volume and the 8p23.1 inversion is associated with increased motor cortex. These significant inversions are in the genomic hotspots of neuropsychiatric loci. Our findings are generalizable to 3,472 children and demonstrate inversions as essential genetic variation to understand human brain phenotypes.
Collapse
Affiliation(s)
- Hao Wang
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Carolina Makowski
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Yanxiao Zhang
- Ludwig Institute for Cancer Research, La Jolla, CA 92093, USA; School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Anna Qi
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Tobias Kaufmann
- Department of Psychiatry and Psychotherapy, Tübingen Center for Mental Health, University of Tübingen, 72076 Tübingen, Germany; Norwegian Centre for Mental Disorders Research, Oslo University Hospital and University of Oslo, 0450 Oslo, Norway
| | - Olav B Smeland
- Norwegian Centre for Mental Disorders Research, Oslo University Hospital and University of Oslo, 0450 Oslo, Norway
| | - Mark Fiecas
- Division of Biostatistics, University of Minnesota School of Public Health, Minneapolis, MN 55455, USA
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Chi-Hua Chen
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
5
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 281] [Impact Index Per Article: 281.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
6
|
Porubsky D, Harvey WT, Rozanski AN, Ebler J, Höps W, Ashraf H, Hasenfeld P, Paten B, Sanders AD, Marschall T, Korbel JO, Eichler EE. Inversion polymorphism in a complete human genome assembly. Genome Biol 2023; 24:100. [PMID: 37122002 PMCID: PMC10150506 DOI: 10.1186/s13059-023-02919-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 03/31/2023] [Indexed: 05/02/2023] Open
Abstract
The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Helmholtz Association, 10115, Berlin, Germany
- Berlin Institute of Health (BIH), 10178, Berlin, Germany
- Charité-Universitätsmedizin, 10117, Berlin, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Moorenstraße 5, 40225, Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
7
|
Chao KH, Zimin AV, Pertea M, Salzberg SL. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3 (BETHESDA, MD.) 2023; 13:jkac321. [PMID: 36630290 PMCID: PMC9997556 DOI: 10.1093/g3journal/jkac321] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/27/2022] [Accepted: 11/03/2022] [Indexed: 01/12/2023]
Abstract
We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Aleksey V Zimin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| |
Collapse
|
8
|
Ehrlich L, Prakash SK. Copy-number variation in congenital heart disease. Curr Opin Genet Dev 2022; 77:101986. [PMID: 36202051 DOI: 10.1016/j.gde.2022.101986] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 01/27/2023]
Abstract
Genomic copy-number variants (CNVs) contribute to as many congenital heart disease (CHD) cases (10-15%) as chromosomal aberrations or single-gene mutations and influence clinical outcomes. CNVs in a few genomic hotspots (1q21.1, 2q13, 8p23.1, 11q24, 15q11.2, 16p11.2, and 22q11.2) are recurrently enriched in CHD cohorts and affect dosage-sensitive transcriptional regulators that are required for cardiac development. Reduced penetrance and pleiotropic effects on brain and heart development are common features of these CNVs. Therefore, additional genetic 'hits,' such as a second CNV or gene mutation, are probably required to cause CHD in most cases. Integrative analysis of CNVs, genome sequence, epigenetic alterations, and gene function will be required to delineate the complete genetic landscape of CHD.
Collapse
Affiliation(s)
- Laurent Ehrlich
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, 6431 Fannin Street, Houston, TX 77030, USA
| | - Siddharth K Prakash
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, 6431 Fannin Street, Houston, TX 77030, USA.
| |
Collapse
|
9
|
Integration of Hi-C with short and long-read genome sequencing reveals the structure of germline rearranged genomes. Nat Commun 2022; 13:6470. [PMID: 36309531 PMCID: PMC9617858 DOI: 10.1038/s41467-022-34053-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 10/07/2022] [Indexed: 12/25/2022] Open
Abstract
Structural variants are a common cause of disease and contribute to a large extent to inter-individual variability, but their detection and interpretation remain a challenge. Here, we investigate 11 individuals with complex genomic rearrangements including germline chromothripsis by combining short- and long-read genome sequencing (GS) with Hi-C. Large-scale genomic rearrangements are identified in Hi-C interaction maps, allowing for an independent assessment of breakpoint calls derived from the GS methods, resulting in >300 genomic junctions. Based on a comprehensive breakpoint detection and Hi-C, we achieve a reconstruction of whole rearranged chromosomes. Integrating information on the three-dimensional organization of chromatin, we observe that breakpoints occur more frequently than expected in lamina-associated domains (LADs) and that a majority reshuffle topologically associating domains (TADs). By applying phased RNA-seq, we observe an enrichment of genes showing allelic imbalanced expression (AIG) within 100 kb around the breakpoints. Interestingly, the AIGs hit by a breakpoint (19/22) display both up- and downregulation, thereby suggesting different mechanisms at play, such as gene disruption and rearrangements of regulatory information. However, the majority of interpretable genes located 200 kb around a breakpoint do not show significant expression changes. Thus, there is an overall robustness in the genome towards large-scale chromosome rearrangements.
Collapse
|
10
|
Campoy E, Puig M, Yakymenko I, Lerga-Jaso J, Cáceres M. Genomic architecture and functional effects of potential human inversion supergenes. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210209. [PMID: 35694745 PMCID: PMC9189494 DOI: 10.1098/rstb.2021.0209] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Supergenes are involved in adaptation in multiple organisms, but they are little known in humans. Genomic inversions are the most common mechanism of supergene generation and maintenance. Here, we review the information about two large inversions that are the best examples of potential human supergenes. In addition, we do an integrative analysis of the newest data to understand better their functional effects and underlying genetic changes. We have found that the highly divergent haplotypes of the 17q21.31 inversion of approximately 1.5 Mb have multiple phenotypic associations, with consistent effects in brain-related traits, red and white blood cells, lung function, male and female characteristics and disease risk. By combining gene expression and nucleotide variation data, we also analysed the molecular differences between haplotypes, including gene duplications, amino acid substitutions and regulatory changes, and identify CRHR1, KANLS1 and MAPT as good candidates to be responsible for these phenotypes. The situation is more complex for the 8p23.1 inversion, where there is no clear genetic differentiation. However, the inversion is associated with several related phenotypes and gene expression differences that could be linked to haplotypes specific of one orientation. Our work, therefore, contributes to the characterization of both exceptional variants and illustrates the important role of inversions. This article is part of the theme issue 'Genomic architecture of supergenes: causes and evolutionary consequences'.
Collapse
Affiliation(s)
- Elena Campoy
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Marta Puig
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Illya Yakymenko
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Jon Lerga-Jaso
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,ICREA, Barcelona, Spain
| |
Collapse
|
11
|
Porubsky D, Höps W, Ashraf H, Hsieh P, Rodriguez-Martin B, Yilmaz F, Ebler J, Hallast P, Maria Maggiolini FA, Harvey WT, Henning B, Audano PA, Gordon DS, Ebert P, Hasenfeld P, Benito E, Zhu Q, Lee C, Antonacci F, Steinrücken M, Beck CR, Sanders AD, Marschall T, Eichler EE, Korbel JO. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 2022; 185:1986-2005.e26. [PMID: 35525246 PMCID: PMC9563103 DOI: 10.1016/j.cell.2022.04.017] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/14/2022] [Accepted: 04/08/2022] [Indexed: 12/13/2022]
Abstract
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Flavia Angela Maria Maggiolini
- Department of Biology, University of Bari "Aldo Moro", 70125 Bari, Italy; Consiglio per la Ricerca in Agricoltura e l'Analisi dell'Economia Agraria-Centro di Ricerca Viticoltura ed Enologia (CREA-VE), Via Casamassima 148, 70010 Turi, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Barbara Henning
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Eva Benito
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | | | - Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA; The University of Connecticut Health Center, 400 Farmington Rd., Farmington, CT 06032, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany; Berlin Institute of Health (BIH), Berlin, Germany; Charité-Universitätsmedizin, Berlin, Berlin, Germany
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 5, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstr. 1, 69117 Heidelberg, Germany; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
12
|
Liu S, Chen H, Ouyang J, Huang M, Zhang H, Zheng S, Xi S, Tang H, Gao Y, Xiong Y, Cheng D, Chen K, Liu B, Li W, Ren J, Yan X, Mao H. A high-quality assembly reveals genomic characteristics, phylogenetic status, and causal genes for leucism plumage of Indian peafowl. Gigascience 2022; 11:giac018. [PMID: 35383847 PMCID: PMC8985102 DOI: 10.1093/gigascience/giac018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/15/2021] [Accepted: 02/09/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND The dazzling phenotypic characteristics of male Indian peafowl (Pavo cristatus) are attractive both to the female of the species and to humans. However, little is known about the evolution of the phenotype and phylogeny of these birds at the whole-genome level. So far, there are no reports regarding the genetic mechanism of the formation of leucism plumage in this variant of Indian peafowl. RESULTS A draft genome of Indian peafowl was assembled, with a genome size of 1.05 Gb (the sequencing depth is 362×), and contig and scaffold N50 were up to 6.2 and 11.4 Mb, respectively. Compared with other birds, Indian peafowl showed changes in terms of metabolism, immunity, and skeletal and feather development, which provided a novel insight into the phenotypic evolution of peafowl, such as the large body size and feather morphologies. Moreover, we determined that the phylogeny of Indian peafowl was more closely linked to turkey than chicken. Specifically, we first identified that PMEL was a potential causal gene leading to the formation of the leucism plumage variant in Indian peafowl. CONCLUSIONS This study provides an Indian peafowl genome of high quality, as well as a novel understanding of phenotypic evolution and phylogeny of Indian peafowl. These results provide a valuable reference for the study of avian genome evolution. Furthermore, the discovery of the genetic mechanism for the development of leucism plumage is both a breakthrough in the exploration of peafowl plumage and also offers clues and directions for further investigations of the avian plumage coloration and artificial breeding in peafowl.
Collapse
Affiliation(s)
- Shaojuan Liu
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hao Chen
- College of Life Science, Jiangxi Science & Technology Normal University, Nanchang 330013, China
| | - Jing Ouyang
- College of Life Science, Jiangxi Science & Technology Normal University, Nanchang 330013, China
| | - Min Huang
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Hui Zhang
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Sumei Zheng
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Suwang Xi
- College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang 330045, China
| | - Hongbo Tang
- College of Life Science, Jiangxi Science & Technology Normal University, Nanchang 330013, China
| | - Yuren Gao
- College of Life Science, Jiangxi Science & Technology Normal University, Nanchang 330013, China
| | - Yanpeng Xiong
- College of Life Science, Jiangxi Science & Technology Normal University, Nanchang 330013, China
| | - Di Cheng
- College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang 330045, China
| | - Kaifeng Chen
- College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang 330045, China
| | - Bingbing Liu
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Wanbo Li
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen 361021, China
| | - Jun Ren
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China
| | - Xueming Yan
- College of Life Science, Jiangxi Science & Technology Normal University, Nanchang 330013, China
| | - Huirong Mao
- College of Animal Science and Technology, Jiangxi Agricultural University, Nanchang 330045, China
| |
Collapse
|
13
|
Redaelli S, Conconi D, Sala E, Villa N, Crosti F, Roversi G, Catusi I, Valtorta C, Recalcati MP, Dalprà L, Lavitrano M, Bentivegna A. Characterization of Chromosomal Breakpoints in 12 Cases with 8p Rearrangements Defines a Continuum of Fragility of the Region. Int J Mol Sci 2022; 23:ijms23063347. [PMID: 35328767 PMCID: PMC8954119 DOI: 10.3390/ijms23063347] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/16/2022] [Accepted: 03/17/2022] [Indexed: 12/29/2022] Open
Abstract
Improvements in microarray-based comparative genomic hybridization technology have allowed for high-resolution detection of genome wide copy number alterations, leading to a better definition of rearrangements and supporting the study of pathogenesis mechanisms. In this study, we focused our attention on chromosome 8p. We report 12 cases of 8p rearrangements, analyzed by molecular karyotype, evidencing a continuum of fragility that involves the entire short arm. The breakpoints seem more concentrated in three intervals: one at the telomeric end, the others at 8p23.1, close to the beta-defensin gene cluster and olfactory receptor low-copy repeats. Hypothetical mechanisms for all cases are described. Our data extend the cohort of published patients with 8p aberrations and highlight the need to pay special attention to these sequences due to the risk of formation of new chromosomal aberrations with pathological effects.
Collapse
Affiliation(s)
- Serena Redaelli
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; (S.R.); (G.R.); (L.D.); (M.L.)
| | - Donatella Conconi
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; (S.R.); (G.R.); (L.D.); (M.L.)
- Correspondence: (D.C.); (A.B.)
| | - Elena Sala
- Medical Genetics Laboratory, Clinical Pathology Department, S. Gerardo Hospital, 20900 Monza, Italy; (E.S.); (N.V.); (F.C.)
| | - Nicoletta Villa
- Medical Genetics Laboratory, Clinical Pathology Department, S. Gerardo Hospital, 20900 Monza, Italy; (E.S.); (N.V.); (F.C.)
| | - Francesca Crosti
- Medical Genetics Laboratory, Clinical Pathology Department, S. Gerardo Hospital, 20900 Monza, Italy; (E.S.); (N.V.); (F.C.)
| | - Gaia Roversi
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; (S.R.); (G.R.); (L.D.); (M.L.)
- Medical Genetics Laboratory, Clinical Pathology Department, S. Gerardo Hospital, 20900 Monza, Italy; (E.S.); (N.V.); (F.C.)
| | - Ilaria Catusi
- Medical Cytogenetics Laboratory, Istituto Auxologico Italiano IRCCS, 20095 Cusano Milanino, Italy; (I.C.); (C.V.); (M.P.R.)
| | - Chiara Valtorta
- Medical Cytogenetics Laboratory, Istituto Auxologico Italiano IRCCS, 20095 Cusano Milanino, Italy; (I.C.); (C.V.); (M.P.R.)
| | - Maria Paola Recalcati
- Medical Cytogenetics Laboratory, Istituto Auxologico Italiano IRCCS, 20095 Cusano Milanino, Italy; (I.C.); (C.V.); (M.P.R.)
| | - Leda Dalprà
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; (S.R.); (G.R.); (L.D.); (M.L.)
- Medical Genetics Laboratory, Clinical Pathology Department, S. Gerardo Hospital, 20900 Monza, Italy; (E.S.); (N.V.); (F.C.)
| | - Marialuisa Lavitrano
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; (S.R.); (G.R.); (L.D.); (M.L.)
| | - Angela Bentivegna
- School of Medicine and Surgery, University of Milano-Bicocca, 20900 Monza, Italy; (S.R.); (G.R.); (L.D.); (M.L.)
- Correspondence: (D.C.); (A.B.)
| |
Collapse
|
14
|
Guo Q, Atkinson SD, Xiao B, Zhai Y, Bartholomew JL, Gu Z. A myxozoan genome reveals mosaic evolution in a parasitic cnidarian. BMC Biol 2022; 20:51. [PMID: 35177085 PMCID: PMC8855578 DOI: 10.1186/s12915-022-01249-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 02/07/2022] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Parasite evolution has been conceptualized as a process of genetic loss and simplification. Contrary to this model, there is evidence of expansion and conservation of gene families related to essential functions of parasitism in some parasite genomes, reminiscent of widespread mosaic evolution-where subregions of a genome have different rates of evolutionary change. We found evidence of mosaic genome evolution in the cnidarian Myxobolus honghuensis, a myxozoan parasite of fish, with extremely simple morphology. RESULTS We compared M. honghuensis with other myxozoans and free-living cnidarians, and determined that it has a relatively larger myxozoan genome (206 Mb), which is less reduced and less compact due to gene retention, large introns, transposon insertion, but not polyploidy. Relative to other metazoans, the M. honghuensis genome is depleted of neural genes and has only the simplest animal immune components. Conversely, it has relatively more genes involved in stress resistance, tissue invasion, energy metabolism, and cellular processes compared to other myxozoans and free-living cnidarians. We postulate that the expansion of these gene families is the result of evolutionary adaptations to endoparasitism. M. honghuensis retains genes found in free-living Cnidaria, including a reduced nervous system, myogenic components, ANTP class Homeobox genes, and components of the Wnt and Hedgehog pathways. CONCLUSIONS Our analyses suggest that the M. honghuensis genome evolved as a mosaic of conservative, divergent, depleted, and enhanced genes and pathways. These findings illustrate that myxozoans are not as genetically simple as previously regarded, and the evolution of some myxozoans is driven by both genomic streamlining and expansion.
Collapse
Affiliation(s)
- Qingxiang Guo
- Department of Aquatic Animal Medicine, College of Fisheries, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
- Hubei Engineering Technology Research Center for Aquatic Animal Diseases Control and Prevention, Wuhan, 430070, People's Republic of China
| | - Stephen D Atkinson
- Department of Microbiology, Oregon State University, Corvallis, OR, 97331, USA
| | - Bin Xiao
- Department of Aquatic Animal Medicine, College of Fisheries, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
- Hubei Engineering Technology Research Center for Aquatic Animal Diseases Control and Prevention, Wuhan, 430070, People's Republic of China
| | - Yanhua Zhai
- Department of Aquatic Animal Medicine, College of Fisheries, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
- Hubei Engineering Technology Research Center for Aquatic Animal Diseases Control and Prevention, Wuhan, 430070, People's Republic of China
| | - Jerri L Bartholomew
- Department of Microbiology, Oregon State University, Corvallis, OR, 97331, USA
| | - Zemao Gu
- Department of Aquatic Animal Medicine, College of Fisheries, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China.
- Hubei Engineering Technology Research Center for Aquatic Animal Diseases Control and Prevention, Wuhan, 430070, People's Republic of China.
| |
Collapse
|
15
|
The structure, function and evolution of a complete human chromosome 8. Nature 2021; 593:101-107. [PMID: 33828295 PMCID: PMC8099727 DOI: 10.1038/s41586-021-03420-7] [Citation(s) in RCA: 184] [Impact Index Per Article: 61.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/04/2021] [Indexed: 02/07/2023]
Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Collapse
|
16
|
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 2020; 30:1291-1305. [PMID: 32801147 PMCID: PMC7545148 DOI: 10.1101/gr.263566.120] [Citation(s) in RCA: 361] [Impact Index Per Article: 90.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Robert Grothe
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
17
|
Cantsilieris S, Sunkin SM, Johnson ME, Anaclerio F, Huddleston J, Baker C, Dougherty ML, Underwood JG, Sulovari A, Hsieh P, Mao Y, Catacchio CR, Malig M, Welch AE, Sorensen M, Munson KM, Jiang W, Girirajan S, Ventura M, Lamb BT, Conlon RA, Eichler EE. An evolutionary driver of interspersed segmental duplications in primates. Genome Biol 2020; 21:202. [PMID: 32778141 PMCID: PMC7419210 DOI: 10.1186/s13059-020-02074-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 06/08/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.
Collapse
Affiliation(s)
- Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Centre for Eye Research Australia, Department of Surgery (Ophthalmology), University of Melbourne, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, 3002, Australia
| | | | - Matthew E Johnson
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Fabio Anaclerio
- Department of Biology-Genetics, University of Bari, Bari, Italy
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, 98195, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Jason G Underwood
- Pacific Biosciences (PacBio) of California, Incorporated, Menlo Park, CA, 94025, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | | | - Maika Malig
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Department of Molecular and Cellular Biology, University of California, Davis, CA, 95616, USA
- Present Address: Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, 95616, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Brain and Mitochondrial Research, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Weihong Jiang
- Case Transgenic and Targeting Facility, Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Department of Anthropology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Mario Ventura
- Department of Biology-Genetics, University of Bari, Bari, Italy
| | - Bruce T Lamb
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Ronald A Conlon
- Case Transgenic and Targeting Facility, Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington School of Medicine, 3720 15th Ave NE, S413C, Box 355065, Seattle, WA, 98195-5065, USA.
| |
Collapse
|
18
|
Porubsky D, Sanders AD, Höps W, Hsieh P, Sulovari A, Li R, Mercuri L, Sorensen M, Murali SC, Gordon D, Cantsilieris S, Pollen AA, Ventura M, Antonacci F, Marschall T, Korbel JO, Eichler EE. Recurrent inversion toggling and great ape genome evolution. Nat Genet 2020; 52:849-858. [PMID: 32541924 PMCID: PMC7415573 DOI: 10.1038/s41588-020-0646-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 05/15/2020] [Indexed: 01/14/2023]
Abstract
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ludovica Mercuri
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Shwetha C Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Centre for Eye Research Australia, Department of Surgery (Ophthalmology), University of Melbourne, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
| | - Alex A Pollen
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Mario Ventura
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Francesca Antonacci
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
19
|
Hsieh P, Vollger MR, Dang V, Porubsky D, Baker C, Cantsilieris S, Hoekzema K, Lewis AP, Munson KM, Sorensen M, Kronenberg ZN, Murali S, Nelson BJ, Chiatante G, Maggiolini FAM, Blanché H, Underwood JG, Antonacci F, Deleuze JF, Eichler EE. Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes. Science 2020; 366:366/6463/eaax2083. [PMID: 31624180 DOI: 10.1126/science.aax2083] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Revised: 07/05/2019] [Accepted: 09/12/2019] [Indexed: 01/01/2023]
Abstract
Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.
Collapse
Affiliation(s)
- PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Vy Dang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Shwetha Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Giorgia Chiatante
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari, Italy
| | | | - Hélène Blanché
- Fondation Jean Dausset-Centre d'Etude du Polymorphisme Humain, Paris, France
| | - Jason G Underwood
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Pacific Biosciences (PacBio) of California, Inc., Menlo Park, CA, USA
| | - Francesca Antonacci
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari, Italy
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
20
|
Shanta O, Noor A, Sebat J. The effects of common structural variants on 3D chromatin structure. BMC Genomics 2020; 21:95. [PMID: 32000688 PMCID: PMC6990566 DOI: 10.1186/s12864-020-6516-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 01/20/2020] [Indexed: 12/28/2022] Open
Abstract
Background Three-dimensional spatial organization of chromosomes is defined by highly self-interacting regions 0.1–1 Mb in size termed Topological Associating Domains (TADs). Genetic factors that explain dynamic variation in TAD structure are not understood. We hypothesize that common structural variation (SV) in the human population can disrupt regulatory sequences and thereby influence TAD formation. To determine the effects of SVs on 3D chromatin organization, we performed chromosome conformation capture sequencing (Hi-C) of lymphoblastoid cell lines from 19 subjects for which SVs had been previously characterized in the 1000 genomes project. We tested the effects of common deletion polymorphisms on TAD structure by linear regression analysis of nearby quantitative chromatin interactions (contacts) within 240 kb of the deletion, and we specifically tested the hypothesis that deletions at TAD boundaries (TBs) could result in large-scale alterations in chromatin conformation. Results Large (> 10 kb) deletions had significant effects on long-range chromatin interactions. Deletions were associated with increased contacts that span the deleted region and this effect was driven by large deletions that were not located within a TAD boundary (nonTB). Some deletions at TBs, including a 80 kb deletion of the genes CFHR1 and CFHR3, had detectable effects on chromatin contacts. However for TB deletions overall, we did not detect a pattern of effects that was consistent in magnitude or direction. Large inversions in the population had a distinguishable signature characterized by a rearrangement of contacts that span its breakpoints. Conclusions Our study demonstrates that common SVs in the population impact long-range chromatin structure, and deletions and inversions have distinct signatures. However, the effects that we observe are subtle and variable between loci. Genome-wide analysis of chromatin conformation in large cohorts will be needed to quantify the influence of common SVs on chromatin structure.
Collapse
Affiliation(s)
- Omar Shanta
- Department of Electrical and Computer Engineering, UCSD, San Diego, CA, USA
| | - Amina Noor
- Beyster Center for Genomics of Psychiatric Diseases, Department of Psychiatry, UCSD, San Diego, CA, USA
| | | | - Jonathan Sebat
- Beyster Center for Genomics of Psychiatric Diseases, Department of Psychiatry, UCSD, San Diego, CA, USA. .,Department of Cellular and Molecular Medicine, UCSD, San Diego, CA, USA. .,Department of Pediatrics, UCSD, San Diego, CA, USA.
| |
Collapse
|
21
|
Alsamman AM, Ibrahim SD, Hamwieh A. KASPspoon: an in vitro and in silico PCR analysis tool for high-throughput SNP genotyping. Bioinformatics 2019; 35:3187-3190. [PMID: 30624621 PMCID: PMC6735863 DOI: 10.1093/bioinformatics/btz004] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 12/15/2018] [Accepted: 01/04/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Fine mapping becomes a routine trial following quantitative trait loci (QTL) mapping studies to shrink the size of genomic segments underlying causal variants. The availability of whole genome sequences can facilitate the development of high marker density and predict gene content in genomic segments of interest. Correlations between genetic and physical positions of these loci require handling of different experimental genetic data types, and ultimately converting them into positioning markers using a routine and efficient tool. RESULTS To convert classical QTL markers into KASP assay primers, KASPspoon simulates a PCR by running an approximate-match searching analysis on user-entered primer pairs against the provided sequences, and then comparing in vitro and in silico PCR results. KASPspoon reports amplimers close to or adjoining genes/SNPs/simple sequence repeats and those that are shared between in vitro and in silico PCR results to select the most appropriate amplimers for gene discovery. KASPspoon compares physical and genetic maps, and reports the primer set genome coverage for PCR-walking. KASPspoon could be used to design KASP assay primers to convert QTL acquired by classical molecular markers into high-throughput genotyping assays and to provide major SNP resource for the dissection of genotypic and phenotypic variation. In addition to human-readable output files, KASPspoon creates Circos configurations that illustrate different in silico and in vitro results. AVAILABILITY AND IMPLEMENTATION Code available under GNU GPL at (http://www.ageri.sci.eg/index.php/facilities-services/ageri-softwares/kaspspoon). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alsamman M Alsamman
- Department of Genome Mapping, Molecular Genetics and Genome Mapping Laboratory, Agricultural Genetic Engineering Research Institute, Giza, Egypt
| | - Shafik D Ibrahim
- Department of Genome Mapping, Molecular Genetics and Genome Mapping Laboratory, Agricultural Genetic Engineering Research Institute, Giza, Egypt
| | - Aladdin Hamwieh
- Department of Biotechnology, International Center for Agricultural Research in the Dry Areas (ICARDA), Cairo, Egypt
| |
Collapse
|
22
|
Abstract
Transposable elements (TEs) are ubiquitous in both prokaryotes and eukaryotes, and the dynamic character of their interaction with host genomes brings about numerous evolutionary innovations and shapes genome structure and function in a multitude of ways. In traditional classification systems, TEs are often being depicted in simplistic ways, based primarily on the key enzymes required for transposition, such as transposases/recombinases and reverse transcriptases. Recent progress in whole-genome sequencing and long-read assembly, combined with expansion of the familiar range of model organisms, resulted in identification of unprecedentedly long transposable units spanning dozens or even hundreds of kilobases, initially in prokaryotic and more recently in eukaryotic systems. Here, we focus on such oversized eukaryotic TEs, including retrotransposons and DNA transposons, outline their complex and often combinatorial nature and closely intertwined relationship with viruses, and discuss their potential for participating in transfer of long stretches of DNA in eukaryotes.
Collapse
Affiliation(s)
- Irina R Arkhipova
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts
- Corresponding author: E-mail:
| | - Irina A Yushenova
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts
| |
Collapse
|
23
|
Characterization and evolutionary dynamics of complex regions in eukaryotic genomes. SCIENCE CHINA-LIFE SCIENCES 2019; 62:467-488. [PMID: 30810961 DOI: 10.1007/s11427-018-9458-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 11/05/2018] [Indexed: 01/07/2023]
Abstract
Complex regions in eukaryotic genomes are typically characterized by duplications of chromosomal stretches that often include one or more genes repeated in a tandem array or in relatively close proximity. Nevertheless, the repetitive nature of these regions, together with the often high sequence identity among repeats, have made complex regions particularly recalcitrant to proper molecular characterization, often being misassembled or completely absent in genome assemblies. This limitation has prevented accurate functional and evolutionary analyses of these regions. This is becoming increasingly relevant as evidence continues to support a central role for complex genomic regions in explaining human disease, developmental innovations, and ecological adaptations across phyla. With the advent of long-read sequencing technologies and suitable assemblers, the development of algorithms that can accommodate sample heterozygosity, and the adoption of a pangenomic-like view of these regions, accurate reconstructions of complex regions are now within reach. These reconstructions will finally allow for accurate functional and evolutionary studies of complex genomic regions, underlying the generation of genotype-phenotype maps of unprecedented resolution.
Collapse
|
24
|
Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS, Underwood JG, Nelson BJ, Chaisson MJP, Dougherty ML, Munson KM, Hastie AR, Diekhans M, Hormozdiari F, Lorusso N, Hoekzema K, Qiu R, Clark K, Raja A, Welch AE, Sorensen M, Baker C, Fulton RS, Armstrong J, Graves-Lindsay TA, Denli AM, Hoppe ER, Hsieh P, Hill CM, Pang AWC, Lee J, Lam ET, Dutcher SK, Gage FH, Warren WC, Shendure J, Haussler D, Schneider VA, Cao H, Ventura M, Wilson RK, Paten B, Pollen A, Eichler EE. High-resolution comparative analysis of great ape genomes. Science 2018; 360:eaar6343. [PMID: 29880660 PMCID: PMC6178954 DOI: 10.1126/science.aar6343] [Citation(s) in RCA: 239] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 04/02/2018] [Indexed: 12/22/2022]
Abstract
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
Collapse
Affiliation(s)
- Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Shwetha Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Olivia S Meyerson
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Jason G Underwood
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Pacific Biosciences (PacBio) of California, Inc., Menlo Park, CA 94025, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, CA 95817, USA
| | - Nicola Lorusso
- Department of Biology, University of Bari, Aldo Moro, Bari 70121, Italy
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Robert S Fulton
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Tina A Graves-Lindsay
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Ahmet M Denli
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Emma R Hoppe
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Christopher M Hill
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | | | - Susan K Dutcher
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Fred H Gage
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Wesley C Warren
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Han Cao
- Bionano Genomics, San Diego, CA 92121, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Aldo Moro, Bari 70121, Italy
| | - Richard K Wilson
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alex Pollen
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
25
|
Recurrent structural variation, clustered sites of selection, and disease risk for the complement factor H ( CFH) gene family. Proc Natl Acad Sci U S A 2018; 115:E4433-E4442. [PMID: 29686068 DOI: 10.1073/pnas.1717600115] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ∼360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ∼25-35 Mya and CFHR1 and CFHR3 ∼7-13 Mya). Remarkably, all evolutionary breakpoints share a common ∼4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.
Collapse
|
26
|
Cost-effective high-throughput single-haplotype iterative mapping and sequencing for complex genomic structures. Nat Protoc 2018; 13:787-809. [PMID: 29565902 DOI: 10.1038/nprot.2018.019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The reference sequences of structurally complex regions can be obtained only through highly accurate clone-based approaches. We and others have successfully used single-haplotype iterative mapping and sequencing (SHIMS) 1.0 to assemble structurally complex regions across the sex chromosomes of several vertebrate species and to allow for targeted improvements to the reference sequences of human autosomes. However, SHIMS 1.0 is expensive and time consuming, requiring resources that only a genome center can provide. Here we introduce SHIMS 2.0, an improved SHIMS protocol that allows even a small laboratory to generate high-quality reference sequence from complex genomic regions. Using a streamlined and parallelized library-preparation protocol, and taking advantage of inexpensive high-throughput short-read-sequencing technologies, a small laboratory with both molecular biology and bioinformatics experience can sequence and assemble 192 large-insert bacterial artificial chromosome (BAC) or fosmid clones in 1 week. In SHIMS 2.0, in contrast to other pooling strategies, each clone is sequenced with a unique barcode, thus enabling clones containing nearly identical sequences to be multiplexed in a single sequencing run and assembled separately. Relative to SHIMS 1.0, SHIMS 2.0 decreases the required cost and time by two orders of magnitude while preserving high sequencing accuracy.
Collapse
|
27
|
Demaerel W, Hestand MS, Vergaelen E, Swillen A, López-Sánchez M, Pérez-Jurado LA, McDonald-McGinn DM, Zackai E, Emanuel BS, Morrow BE, Breckpot J, Devriendt K, Vermeesch JR. Nested Inversion Polymorphisms Predispose Chromosome 22q11.2 to Meiotic Rearrangements. Am J Hum Genet 2017; 101:616-622. [PMID: 28965848 DOI: 10.1016/j.ajhg.2017.09.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 08/16/2017] [Indexed: 11/17/2022] Open
Abstract
Inversion polymorphisms between low-copy repeats (LCRs) might predispose chromosomes to meiotic non-allelic homologous recombination (NAHR) events and thus lead to genomic disorders. However, for the 22q11.2 deletion syndrome (22q11.2DS), the most common genomic disorder, no such inversions have been uncovered as of yet. Using fiber-FISH, we demonstrate that parents transmitting the de novo 3 Mb LCR22A-D 22q11.2 deletion, the reciprocal duplication, and the smaller 1.5 Mb LCR22A-B 22q11.2 deletion carry inversions of LCR22B-D or LCR22C-D. Hence, the inversions predispose chromosome 22q11.2 to meiotic rearrangements and increase the individual risk for transmitting rearrangements. Interestingly, the inversions are nested or flanking rather than coinciding with the deletion or duplication sizes. This finding raises the possibility that inversions are a prerequisite not only for 22q11.2 rearrangements but also for all NAHR-mediated genomic disorders.
Collapse
Affiliation(s)
- Wolfram Demaerel
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Matthew S Hestand
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Elfi Vergaelen
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Ann Swillen
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Marcos López-Sánchez
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain; Institut Hospital del Mar d'Investigacions Mèdiques, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Raras, Barcelona, Spain
| | - Luis A Pérez-Jurado
- Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain; Institut Hospital del Mar d'Investigacions Mèdiques, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Raras, Barcelona, Spain
| | - Donna M McDonald-McGinn
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Elaine Zackai
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Beverly S Emanuel
- Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Jeroen Breckpot
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Koenraad Devriendt
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Joris R Vermeesch
- Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium.
| |
Collapse
|
28
|
Collins RL, Brand H, Redin CE, Hanscom C, Antolik C, Stone MR, Glessner JT, Mason T, Pregno G, Dorrani N, Mandrile G, Giachino D, Perrin D, Walsh C, Cipicchio M, Costello M, Stortchevoi A, An JY, Currall BB, Seabra CM, Ragavendran A, Margolin L, Martinez-Agosto JA, Lucente D, Levy B, Sanders SJ, Wapner RJ, Quintero-Rivera F, Kloosterman W, Talkowski ME. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol 2017; 18:36. [PMID: 28260531 PMCID: PMC5338099 DOI: 10.1186/s13059-017-1158-6] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 01/20/2017] [Indexed: 12/13/2022] Open
Abstract
Background Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. Results We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. Conclusions These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1158-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ryan L Collins
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, 02115, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Harrison Brand
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Claire E Redin
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Carrie Hanscom
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Caroline Antolik
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Matthew R Stone
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Joseph T Glessner
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Tamara Mason
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Giulia Pregno
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano, Italy
| | - Naghmeh Dorrani
- Department of Pathology & Laboratory Medicine and UCLA Clinical Genomics Center, David Geffen School of Medicine, University of California Los Angeles, UCLA, Los Angeles, CA, 90095, USA
| | - Giorgia Mandrile
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano, Italy
| | - Daniela Giachino
- Medical Genetics Unit, Department of Clinical and Biological Sciences, University of Torino, Orbassano, Italy
| | - Danielle Perrin
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Cole Walsh
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Michelle Cipicchio
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Maura Costello
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Alexei Stortchevoi
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Joon-Yong An
- Department of Psychiatry, University of California San Francisco, San Francisco, CA, 94103, USA
| | - Benjamin B Currall
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Catarina M Seabra
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA.,GABBA Program, University of Porto, Porto, 4099-002, Portugal
| | - Ashok Ragavendran
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA.,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Lauren Margolin
- Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA
| | - Julian A Martinez-Agosto
- Department of Pathology & Laboratory Medicine and UCLA Clinical Genomics Center, David Geffen School of Medicine, University of California Los Angeles, UCLA, Los Angeles, CA, 90095, USA
| | - Diane Lucente
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Brynn Levy
- Department of Pathology, Columbia University, New York, NY, 10032, USA
| | - Stephan J Sanders
- Department of Psychiatry, University of California San Francisco, San Francisco, CA, 94103, USA
| | - Ronald J Wapner
- Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Columbia University Medical Center, New York, NY, 10032, USA
| | - Fabiola Quintero-Rivera
- Department of Pathology & Laboratory Medicine and UCLA Clinical Genomics Center, David Geffen School of Medicine, University of California Los Angeles, UCLA, Los Angeles, CA, 90095, USA
| | - Wigard Kloosterman
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, 3584CG, The Netherlands
| | - Michael E Talkowski
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, and Department of Neurology, Massachusetts General Hospital, Boston, MA, 02114, USA. .,Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, 02115, USA. .,Program in Population and Medical Genetics and Genomics Platform, The Broad Institute of M.I.T. and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|