1
|
Grochowski CM, Bengtsson JD, Du H, Gandhi M, Lun MY, Mehaffey MG, Park K, Höps W, Benito E, Hasenfeld P, Korbel JO, Mahmoud M, Paulin LF, Jhangiani SN, Hwang JP, Bhamidipati SV, Muzny DM, Fatih JM, Gibbs RA, Pendleton M, Harrington E, Juul S, Lindstrand A, Sedlazeck FJ, Pehlivan D, Lupski JR, Carvalho CMB. Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. CELL GENOMICS 2024; 4:100590. [PMID: 38908378 DOI: 10.1016/j.xgen.2024.100590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 12/27/2023] [Accepted: 05/31/2024] [Indexed: 06/24/2024]
Abstract
The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.
Collapse
Affiliation(s)
| | | | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Ming Yin Lun
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | | | - KyungHee Park
- Pacific Northwest Research Institute, Seattle, WA 98122, USA
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eva Benito
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Medhat Mahmoud
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - James Paul Hwang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sravya V Bhamidipati
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | - Sissel Juul
- Oxford Nanopore Technologies, New York, NY 10013, USA
| | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institutet, 171 76 Stockholm, Sweden; Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Fritz J Sedlazeck
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Computer Science, Rice University, Houston TX 77030, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Section of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA
| | | |
Collapse
|
2
|
Grimes K, Jeong H, Amoah A, Xu N, Niemann J, Raeder B, Hasenfeld P, Stober C, Rausch T, Benito E, Jann JC, Nowak D, Emini R, Hoenicka M, Liebold A, Ho A, Shuai S, Geiger H, Sanders AD, Korbel JO. Cell-type-specific consequences of mosaic structural variants in hematopoietic stem and progenitor cells. Nat Genet 2024; 56:1134-1146. [PMID: 38806714 PMCID: PMC11176070 DOI: 10.1038/s41588-024-01754-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 04/17/2024] [Indexed: 05/30/2024]
Abstract
The functional impact and cellular context of mosaic structural variants (mSVs) in normal tissues is understudied. Utilizing Strand-seq, we sequenced 1,133 single-cell genomes from 19 human donors of increasing age, and discovered the heterogeneous mSV landscapes of hematopoietic stem and progenitor cells. While mSVs are continuously acquired throughout life, expanded subclones in our cohort are confined to individuals >60. Cells already harboring mSVs are more likely to acquire additional somatic structural variants, including megabase-scale segmental aneuploidies. Capitalizing on comprehensive single-cell micrococcal nuclease digestion with sequencing reference data, we conducted high-resolution cell-typing for eight hematopoietic stem and progenitor cells. Clonally expanded mSVs disrupt normal cellular function by dysregulating diverse cellular pathways, and enriching for myeloid progenitors. Our findings underscore the contribution of mSVs to the cellular and molecular phenotypes associated with the aging hematopoietic system, and establish a foundation for deciphering the molecular links between mSVs, aging and disease susceptibility in normal tissues.
Collapse
Affiliation(s)
- Karen Grimes
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Hyobin Jeong
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Department of Systems Biology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Amanda Amoah
- Institute of Molecular Medicine, Ulm University, Ulm, Germany
| | - Nuo Xu
- Department of Human Cell Biology and Genetics, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Julian Niemann
- Institute of Molecular Medicine, Ulm University, Ulm, Germany
| | - Benjamin Raeder
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Catherine Stober
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tobias Rausch
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Molecular Medicine Partnership Unit (MMPU), European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
- Bridging Research Division on Mechanisms of Genomic Variation and Data Science, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Eva Benito
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Johann-Christoph Jann
- Department of Hematology and Oncology, Medical Faculty Mannheim of the Heidelberg University, Mannheim, Germany
| | - Daniel Nowak
- Department of Hematology and Oncology, Medical Faculty Mannheim of the Heidelberg University, Mannheim, Germany
| | - Ramiz Emini
- Department of Cardiothoracic and Vascular Surgery, Ulm University Hospital, Ulm, Germany
| | - Markus Hoenicka
- Department of Cardiothoracic and Vascular Surgery, Ulm University Hospital, Ulm, Germany
| | - Andreas Liebold
- Department of Cardiothoracic and Vascular Surgery, Ulm University Hospital, Ulm, Germany
| | - Anthony Ho
- Molecular Medicine Partnership Unit (MMPU), European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
- Department of Medicine V, Hematology, Oncology and Rheumatology, University of Heidelberg, Heidelberg, Germany
| | - Shimin Shuai
- Department of Human Cell Biology and Genetics, School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Hartmut Geiger
- Institute of Molecular Medicine, Ulm University, Ulm, Germany
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany.
- Berlin Institute of Health (BIH) at Charité-Universitätsmedizin Berlin, Berlin, Germany.
- Charité-Universitätsmedizin Berlin, Berlin, Germany.
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
- Molecular Medicine Partnership Unit (MMPU), European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany.
- Bridging Research Division on Mechanisms of Genomic Variation and Data Science, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
3
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, Porubsky D, Mao Y, Yoo D, Rautiainen M, Koren S, Nurk S, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Ventura M, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. Nature 2024; 629:136-145. [PMID: 38570684 PMCID: PMC11062924 DOI: 10.1038/s41586-024-07278-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Claudia R Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, United Kingdom
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - Ivan A Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Ding W, Li X, Zhang J, Ji M, Zhang M, Zhong X, Cao Y, Liu X, Li C, Xiao C, Wang J, Li T, Yu Q, Mo F, Zhang B, Qi J, Yang JC, Qi J, Tian L, Xu X, Peng Q, Zhou WZ, Liu Z, Fu A, Zhang X, Zhang JJ, Sun Y, Hu B, An NA, Zhang L, Li CY. Adaptive functions of structural variants in human brain development. SCIENCE ADVANCES 2024; 10:eadl4600. [PMID: 38579006 DOI: 10.1126/sciadv.adl4600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 03/01/2024] [Indexed: 04/07/2024]
Abstract
Quantifying the structural variants (SVs) in nonhuman primates could provide a niche to clarify the genetic backgrounds underlying human-specific traits, but such resource is largely lacking. Here, we report an accurate SV map in a population of 562 rhesus macaques, verified by in-house benchmarks of eight macaque genomes with long-read sequencing and another one with genome assembly. This map indicates stronger selective constrains on inversions at regulatory regions, suggesting a strategy for prioritizing them with the most important functions. Accordingly, we identified 75 human-specific inversions and prioritized them. The top-ranked inversions have substantially shaped the human transcriptome, through their dual effects of reconfiguring the ancestral genomic architecture and introducing regional mutation hotspots at the inverted regions. As a proof of concept, we linked APCDD1, located on one of these inversions and down-regulated specifically in humans, to neuronal maturation and cognitive ability. We thus highlight inversions in shaping the human uniqueness in brain development.
Collapse
Affiliation(s)
- Wanqiu Ding
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xiangshang Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Mingjun Ji
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Mengling Zhang
- State Key Laboratory of Membrane Biology, Biomedical Pioneer Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
| | - Xiaoming Zhong
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Center of Excellence for Leukemia Studies, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Yong Cao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, 119S Fourth Ring Rd W, Fengtai District, Beijing, China
| | - Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunqiong Li
- Chinese Institute for Brain Research, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jiaxin Wang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Ting Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Qing Yu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Boya Zhang
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jianhuan Qi
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jie-Chun Yang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Juntian Qi
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Lu Tian
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Qi Peng
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhijin Liu
- College of Life Sciences, Capital Normal University, Beijing, China
| | - Aisi Fu
- Wuhan Dgensee Clinical Laboratory, Wuhan, China
| | - Xiuqin Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jian-Jun Zhang
- Shanxi Key Laboratory of Chinese Medicine Encephalopathy, National International Joint Research Center for Molecular Chinese Medicine, Shanxi University of Chinese Medicine, Jinzhong, China
| | - Yujie Sun
- State Key Laboratory of Membrane Biology, Biomedical Pioneer Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China
| | - Baoyang Hu
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- National Biomedical Imaging Center, College of Future Technology, Peking University, Beijing, China
- Southwest United Graduate School, Kunming 650092, China
| |
Collapse
|
5
|
Grochowski CM, Bengtsson JD, Du H, Gandhi M, Lun MY, Mehaffey MG, Park K, Höps W, Benito-Garagorri E, Hasenfeld P, Korbel JO, Mahmoud M, Paulin LF, Jhangiani SN, Muzny DM, Fatih JM, Gibbs RA, Pendleton M, Harrington E, Juul S, Lindstrand A, Sedlazeck FJ, Pehlivan D, Lupski JR, Carvalho CMB. Break-induced replication underlies formation of inverted triplications and generates unexpected diversity in haplotype structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.02.560172. [PMID: 37873367 PMCID: PMC10592851 DOI: 10.1101/2023.10.02.560172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Background The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .
Collapse
|
6
|
Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, Eichler EE, Phillippy AM, Koren S. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol 2023; 41:1474-1482. [PMID: 36797493 PMCID: PMC10427740 DOI: 10.1038/s41587-023-01662-6] [Citation(s) in RCA: 70] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 01/03/2023] [Indexed: 02/18/2023]
Abstract
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
7
|
Yi D, Nam JW, Jeong H. Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches. Brief Bioinform 2023; 24:bbad297. [PMID: 37587831 PMCID: PMC10516374 DOI: 10.1093/bib/bbad297] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/05/2023] [Accepted: 07/23/2023] [Indexed: 08/18/2023] Open
Abstract
Structural variants (SVs) are genomic rearrangements that can take many different forms such as copy number alterations, inversions and translocations. During cell development and aging, somatic SVs accumulate in the genome with potentially neutral, deleterious or pathological effects. Generation of somatic SVs is a key mutational process in cancer development and progression. Despite their importance, the detection of somatic SVs is challenging, making them less studied than somatic single-nucleotide variants. In this review, we summarize recent advances in whole-genome sequencing (WGS)-based approaches for detecting somatic SVs at the tissue and single-cell levels and discuss their advantages and limitations. First, we describe the state-of-the-art computational algorithms for somatic SV calling using bulk WGS data and compare the performance of somatic SV detectors in the presence or absence of a matched-normal control. We then discuss the unique features of cutting-edge single-cell-based techniques for analyzing somatic SVs. The advantages and disadvantages of bulk and single-cell approaches are highlighted, along with a discussion of their sensitivity to copy-neutral SVs, usefulness for functional inferences and experimental and computational costs. Finally, computational approaches for linking somatic SVs to their functional readouts, such as those obtained from single-cell transcriptome and epigenome analyses, are illustrated, with a discussion of the promise of these approaches in health and diseases.
Collapse
Affiliation(s)
- Dohun Yi
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Jin-Wu Nam
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Research Institute for Convergence of Basic Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| | - Hyobin Jeong
- Department of Life Science, College of Natural Sciences, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Bio-BigData Center, Hanyang Institute of Bioscience and Biotechnology, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
- Hanyang Institute of Advanced BioConvergence, Hanyang University, Wangsimni-ro 222, Seongdong-gu, Seoul 04763, Republic of Korea
| |
Collapse
|
8
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 71] [Impact Index Per Article: 71.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
9
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Mao Y, Rautiainen M, Koren S, Nurk S, Porubsky D, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.30.542849. [PMID: 37398417 PMCID: PMC10312506 DOI: 10.1101/2023.05.30.542849] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N. Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K. Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A. Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
10
|
Abstract
Dense local haplotypes can now readily be extracted from long-read or droplet-based sequence data. However, these methods struggle to combine subchromosomal haplotype blocks into global chromosome-length haplotypes. Strand-seq is a single cell sequencing technique that uses read orientation to capture sparse global phase information by sequencing only one of two DNA strands for each parental homolog. In combination with dense local haplotypes from other technologies, Strand-seq data can be used to obtain complete chromosome-length phase information. In this chapter, we run the R package StrandPhaseR to phase SNVs using publicly available sequence data for sample HG005 of the Genome in a Bottle project.
Collapse
Affiliation(s)
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
11
|
Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin CS, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, Miga KH. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022; 611:519-531. [PMID: 36261518 PMCID: PMC9668749 DOI: 10.1038/s41586-022-05325-5] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 09/06/2022] [Indexed: 01/01/2023]
Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Collapse
Affiliation(s)
- Erich D. Jarvis
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA ,grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA
| | - Giulio Formenti
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
| | - Arang Rhie
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Andrea Guarracino
- grid.510779.d0000 0004 9414 6915Genomics Research Centre, Human Technopole, Viale Rita Levi-Montalcini, Milan, Italy
| | - Chentao Yang
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China
| | - Jonathan Wood
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Alan Tracey
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Francoise Thibaud-Nissen
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | - Mitchell R. Vollger
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - David Porubsky
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Haoyu Cheng
- grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA ,grid.38142.3c000000041936754XDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA USA
| | - Mobin Asri
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Glennis A. Logsdon
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Paolo Carnevali
- grid.507326.50000 0004 6090 4941Chan Zuckerberg Initiative, Redwood City, CA USA
| | - Mark J. P. Chaisson
- grid.42505.360000 0001 2156 6853Quantitative and Computational Biology, University of Southern California, Los Angeles, CA USA
| | | | - Sarah Cody
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Joanna Collins
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Peter Ebert
- grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Merly Escalona
- grid.205975.c0000 0001 0740 6917Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA USA
| | - Olivier Fedrigo
- grid.134907.80000 0001 2166 1519Vertebrate Genome Laboratory, The Rockefeller University, New York, NY USA
| | - Robert S. Fulton
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Lucinda L. Fulton
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Shilpa Garg
- grid.5254.60000 0001 0674 042XDepartment of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jennifer L. Gerton
- grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
| | - Jay Ghurye
- grid.504403.6Dovetail Genomics, Scotts Valley, CA USA
| | | | - Richard E. Green
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - William Harvey
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Patrick Hasenfeld
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Alex Hastie
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
| | - Marina Haukness
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Erich B. Jaeger
- grid.185669.50000 0004 0507 3954Illumina, Inc., San Diego, CA USA
| | - Miten Jain
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Melanie Kirsche
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Mikhail Kolmogorov
- grid.266100.30000 0001 2107 4242Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA
| | - Jan O. Korbel
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Sergey Koren
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Jonas Korlach
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Joyce Lee
- grid.470262.50000 0004 0473 1353Bionano Genomics, San Diego, CA USA
| | - Daofeng Li
- grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
| | - Tina Lindsay
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA
| | - Julian Lucas
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Feng Luo
- grid.26090.3d0000 0001 0665 0280School of Computing, Clemson University, Clemson, SC USA
| | - Tobias Marschall
- grid.411327.20000 0001 2176 9917Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Matthew W. Mitchell
- grid.282012.b0000 0004 0627 5048Coriell Institute for Medical Research, Camden, NJ USA
| | - Jennifer McDaniel
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Fan Nie
- grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hugh E. Olsen
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Nathan D. Olson
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Trevor Pesout
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Tamara Potapova
- grid.250820.d0000 0000 9420 1591Stowers Institute for Medical Research, Kansas City, MO USA
| | - Daniela Puiu
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Allison Regier
- grid.511991.40000 0004 4910 5831DNAnexus, Mountain View, CA USA
| | - Jue Ruan
- grid.410727.70000 0001 0526 1937Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Steven L. Salzberg
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Ashley D. Sanders
- grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Michael C. Schatz
- grid.21107.350000 0001 2171 9311Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | | | - Valerie A. Schneider
- grid.94365.3d0000 0001 2297 5165National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD USA
| | | | - Kishwar Shafin
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Alaina Shumate
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Nathan O. Stitziel
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Cardiovascular Division, John T. Milliken Department of Internal Medicine, Washington University School of Medicine, St. Louis, USA
| | - Catherine Stober
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - James Torrance
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Justin Wagner
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Jianxin Wang
- grid.216417.70000 0001 0379 7164Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Aaron Wenger
- grid.423340.20000 0004 0640 9878Pacific Biosciences, Menlo Park, CA USA
| | - Chuanle Xiao
- grid.12981.330000 0001 2360 039XState Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Aleksey V. Zimin
- grid.21107.350000 0001 2171 9311Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Guojie Zhang
- grid.13402.340000 0004 1759 700XCenter for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
| | - Ting Wang
- grid.4367.60000 0001 2355 7002McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002Department of Genetics, Washington University School of Medicine, St. Louis, MO USA ,grid.4367.60000 0001 2355 7002The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO USA
| | - Heng Li
- grid.65499.370000 0001 2106 9910Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
| | - Erik Garrison
- grid.267301.10000 0004 0386 9246Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN USA
| | - David Haussler
- grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,grid.205975.c0000 0001 0740 6917Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA USA
| | - Ira Hall
- grid.47100.320000000419368710Yale School of Medicine, New Haven, CT USA
| | - Justin M. Zook
- grid.94225.38000000012158463XMaterial Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD USA
| | - Evan E. Eichler
- grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Chevy Chase, MD USA ,grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA USA
| | - Adam M. Phillippy
- grid.94365.3d0000 0001 2297 5165Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA
| | - Benedict Paten
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | - Kerstin Howe
- grid.10306.340000 0004 0606 5382Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Karen H. Miga
- grid.205975.c0000 0001 0740 6917UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA USA
| | | |
Collapse
|
12
|
Hanlon VCT, Lansdorp PM, Guryev V. A survey of current methods to detect and genotype inversions. Hum Mutat 2022; 43:1576-1589. [PMID: 36047337 DOI: 10.1002/humu.24458] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 08/26/2022] [Accepted: 08/29/2022] [Indexed: 11/11/2022]
Abstract
Polymorphic inversions are ubiquitous in humans, and they have been linked to both adaptation and disease. Following their discovery in Drosophila more than a century ago, inversions have proved to be more elusive than other structural variants. A wide variety of methods for the detection and genotyping of inversions have recently been developed: multiple techniques based on selective amplification by PCR, short- and long-read sequencing approaches, principal component analysis of small variant haplotypes, template strand sequencing, optical mapping, and various genome assembly methods. Many methods apply complex wet lab protocols or increasingly refined bioinformatic analyses. This review is an attempt to provide a practical summary and comparison of the methods that are in current use, with a focus on metrics such as the maximum size of segmental duplications at inversion breakpoints that each method can tolerate, the size range of inversions that they recover, their throughput, and whether the locations of putative inversions must be known beforehand. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, V5Z 1L3, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, The Netherlands
| |
Collapse
|
13
|
Hanlon VC, Chan DD, Hamadeh Z, Wang Y, Mattsson CA, Spierings DC, Coope RJ, Lansdorp PM. Construction of Strand-seq libraries in open nanoliter arrays. CELL REPORTS METHODS 2022; 2:100150. [PMID: 35474869 PMCID: PMC9017222 DOI: 10.1016/j.crmeth.2021.100150] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 10/22/2021] [Accepted: 12/17/2021] [Indexed: 12/22/2022]
Abstract
Single-cell Strand-seq generates directional genomic information to study DNA repair, assemble genomes, and map structural variation onto chromosome-length haplotypes. We report a nanoliter-volume, one-pot (OP) Strand-seq library preparation protocol in which reagents are added cumulatively, DNA purification steps are avoided, and enzymes are inactivated with a thermolabile protease. OP-Strand-seq libraries capture 10%-25% of the genome from a single-cell with reduced costs and increased throughput.
Collapse
Affiliation(s)
| | - Daniel D. Chan
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | - Zeid Hamadeh
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | - Yanni Wang
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | | | - Diana C.J. Spierings
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, the Netherlands
| | - Robin J.N. Coope
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
| | - Peter M. Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, the Netherlands
- Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
14
|
Hanlon VCT, Mattsson CA, Spierings DCJ, Guryev V, Lansdorp PM. InvertypeR: Bayesian inversion genotyping with Strand-seq data. BMC Genomics 2021; 22:582. [PMID: 34332539 PMCID: PMC8325862 DOI: 10.1186/s12864-021-07892-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 07/15/2021] [Indexed: 11/23/2022] Open
Abstract
Background Single cell Strand-seq is a unique tool for the discovery and phasing of genomic inversions. Conventional methods to discover inversions with Strand-seq data are blind to known inversion locations, limiting their statistical power for the detection of inversions smaller than 10 Kb. Moreover, the methods rely on manual inspection to separate false and true positives. Results Here we describe “InvertypeR”, a method based on a Bayesian binomial model that genotypes inversions using fixed genomic coordinates. We validated InvertypeR by re-genotyping inversions reported for three trios by the Human Genome Structural Variation Consortium. Although 6.3% of the family inversion genotypes in the original study showed Mendelian discordance, this was reduced to 0.5% using InvertypeR. By applying InvertypeR to published inversion coordinates and predicted inversion hotspots (n = 3701), as well as coordinates from conventional inversion discovery, we furthermore genotyped 66 inversions not previously reported for the three trios. Conclusions InvertypeR discovers, genotypes, and phases inversions without relying on manual inspection. For greater accessibility, results are presented as phased chromosome ideograms with inversions linked to Strand-seq data in the genome browser. InvertypeR increases the power of Strand-seq for studies on the role of inversions in phenotypic variation, genome instability, and human disease. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07892-9.
Collapse
Affiliation(s)
- Vincent C T Hanlon
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada.
| | - Carl-Adam Mattsson
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Diana C J Spierings
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, The Netherlands
| | - Peter M Lansdorp
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, V5Z 1L3, Canada.,European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, The Netherlands.,Departments of Medical Genetics and Hematology, University of British Columbia, Vancouver, British Columbia, V6T 1Z4, Canada
| |
Collapse
|
15
|
Kronenberg ZN, Rhie A, Koren S, Concepcion GT, Peluso P, Munson KM, Porubsky D, Kuhn K, Mueller KA, Low WY, Hiendleder S, Fedrigo O, Liachko I, Hall RJ, Phillippy AM, Eichler EE, Williams JL, Smith TPL, Jarvis ED, Sullivan ST, Kingan SB. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat Commun 2021; 12:1935. [PMID: 33911078 PMCID: PMC8081726 DOI: 10.1038/s41467-020-20536-y] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 11/12/2020] [Indexed: 01/27/2023] Open
Abstract
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.
Collapse
Affiliation(s)
- Zev N Kronenberg
- Phase Genomics, Seattle, WA, USA.
- Pacific Biosciences, Menlo Park, CA, USA.
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kristen Kuhn
- US Meat Animal Research Center, ARS USDA, Clay Center, NE, USA
| | | | - Wai Yee Low
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Stefan Hiendleder
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
| | - Olivier Fedrigo
- Vertebrate Genomes Laboratory, The Rockefeller University, New York, NY, USA
| | | | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - John L Williams
- Davies Research Centre, School of Animal and Veterinary Sciences, The University of Adelaide, Roseworthy, SA, Australia
- Dipartimento di Scienze Animali, della Nutrizione e degli Alimenti, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | | | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | | |
Collapse
|
16
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 307] [Impact Index Per Article: 102.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
17
|
Porubsky D, Ebert P, Audano PA, Vollger MR, Harvey WT, Marijon P, Ebler J, Munson KM, Sorensen M, Sulovari A, Haukness M, Ghareghani M, Lansdorp PM, Paten B, Devine SE, Sanders AD, Lee C, Chaisson MJP, Korbel JO, Eichler EE, Marschall T. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat Biotechnol 2021; 39:302-308. [PMID: 33288906 PMCID: PMC7954704 DOI: 10.1038/s41587-020-0719-5] [Citation(s) in RCA: 88] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 09/16/2020] [Indexed: 12/18/2022]
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pierre Marijon
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Jana Ebler
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Maryam Ghareghani
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
- Center for Bioinformatics, Saarland University, and Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Department of Life Science, Ewha Womans University, Seoul, Republic of Korea
| | - Mark J P Chaisson
- Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany.
| |
Collapse
|
18
|
The structure, function and evolution of a complete human chromosome 8. Nature 2021; 593:101-107. [PMID: 33828295 PMCID: PMC8099727 DOI: 10.1038/s41586-021-03420-7] [Citation(s) in RCA: 169] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/04/2021] [Indexed: 02/07/2023]
Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Collapse
|
19
|
Single-cell strand sequencing of a macaque genome reveals multiple nested inversions and breakpoint reuse during primate evolution. Genome Res 2020; 30:1680-1693. [PMID: 33093070 PMCID: PMC7605249 DOI: 10.1101/gr.265322.120] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/02/2020] [Indexed: 12/14/2022]
Abstract
Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.
Collapse
|
20
|
Recurrent inversion toggling and great ape genome evolution. Nat Genet 2020; 52:849-858. [PMID: 32541924 PMCID: PMC7415573 DOI: 10.1038/s41588-020-0646-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 05/15/2020] [Indexed: 01/14/2023]
Abstract
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Collapse
|
21
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|