1
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, Porubsky D, Mao Y, Yoo D, Rautiainen M, Koren S, Nurk S, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Ventura M, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. Nature 2024; 629:136-145. [PMID: 38570684 PMCID: PMC11062924 DOI: 10.1038/s41586-024-07278-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Claudia R Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, United Kingdom
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - Ivan A Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
2
|
Rangwala SH, Rudnev DV, Ananiev VV, Oh DH, Asztalos A, Benica B, Borodin EA, Bouk N, Evgeniev VI, Kodali VK, Lotov V, Mozes E, Omelchenko MV, Savkina S, Sukharnikov E, Virothaisakun J, Murphy TD, Pruitt KD, Schneider VA. The NCBI Comparative Genome Viewer (CGV) is an interactive visualization tool for the analysis of whole-genome eukaryotic alignments. PLoS Biol 2024; 22:e3002405. [PMID: 38713717 PMCID: PMC11101090 DOI: 10.1371/journal.pbio.3002405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 05/17/2024] [Accepted: 04/08/2024] [Indexed: 05/09/2024] Open
Abstract
We report a new visualization tool for analysis of whole-genome assembly-assembly alignments, the Comparative Genome Viewer (CGV) (https://ncbi.nlm.nih.gov/genome/cgv/). CGV visualizes pairwise same-species and cross-species alignments provided by National Center for Biotechnology Information (NCBI) using assembly alignment algorithms developed by us and others. Researchers can examine large structural differences spanning chromosomes, such as inversions or translocations. Users can also navigate to regions of interest, where they can detect and analyze smaller-scale deletions and rearrangements within specific chromosome or gene regions. RefSeq or user-provided gene annotation is displayed where available. CGV currently provides approximately 800 alignments from over 350 animal, plant, and fungal species. CGV and related NCBI viewers are undergoing active development to further meet needs of the research community in comparative genome visualization.
Collapse
Affiliation(s)
- Sanjida H. Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Dmitry V. Rudnev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Victor V. Ananiev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Dong-Ha Oh
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Andrea Asztalos
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Barrett Benica
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Evgeny A. Borodin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Vladislav I. Evgeniev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Vamsi K. Kodali
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Vadim Lotov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Eyal Mozes
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Marina V. Omelchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Sofya Savkina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Ekaterina Sukharnikov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Joël Virothaisakun
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| |
Collapse
|
3
|
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304756. [PMID: 38585974 PMCID: PMC10996739 DOI: 10.1101/2024.03.22.24304756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse Keskus
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A. Lansdon
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | | | - Samuel Sacco
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K. Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Irina Pushel
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S. Farooqi
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
4
|
Porubsky D, Eichler EE. A 25-year odyssey of genomic technology advances and structural variant discovery. Cell 2024; 187:1024-1037. [PMID: 38290514 DOI: 10.1016/j.cell.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 02/01/2024]
Abstract
This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
5
|
Yazarloo M, Sarafraz MR, Jabbari S, Gholipour T, Hashemi T. Comparison of retrospective and prospective memory in subtypes of obsessive-compulsive disorder. Eur J Transl Myol 2024; 34:12221. [PMID: 38344936 PMCID: PMC11017170 DOI: 10.4081/ejtm.2024.12221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 01/12/2024] [Indexed: 03/28/2024] Open
Abstract
Retrospective and prospective memory deficits play a role in maintaining and perpetuating the symptoms of obsessive-compulsive disorder (OCD), but less is known about these deficits in different subtypes of OCD. The aim of the present study was to evaluate the retrospective and prospective memory in patients suffering from cleaning, checking, symmetry, and religious obsessions. In a comparative causal method, 60 participants aged 28 to 55, in 2023, were selected by convenience sampling and placed in five groups of individuals with cleaning, checking, symmetry, religious obsessions, and a healthy group. Participants completed self-report questionnaires and neurocognitive tools. Results showed that defects in retrospective memory were significant in all types of obsessions (p<0.05) except religious obsessions. Also, this defect was more severe in checking obsession disorder compared to other types of OCD. Also, the finding indicated that the defect in prospective memory was significant only in checking obsession disorder (p<0.05). Retrospective and prospective memory impairments and their relationship with deficits in executive functions can be different depending on the type of OCD. Based on the findings, impairment of executive function indirectly by impacting the impairment of other cognitive mechanisms diminishes confidence in retrospective and prospective memory which leads to compulsive behaviors in individuals with contamination and checking obsessions. Also, the impairment of retrospective memory in symmetry obsessions might have a relationship with information encoding, which in turn leads to difficulty recalling information from memory.
Collapse
Affiliation(s)
| | | | - Saeide Jabbari
- Faculty of Educational Sciences and psychology, University of Tabriz, Tabriz.
| | - Taraneh Gholipour
- Department of Psychology, Islamic Azad University, Bandargaz branch, Bandargaz.
| | | |
Collapse
|
6
|
Yang L, Metzger GA, Padilla Del Valle R, Delgadillo Rubalcaba D, McLaughlin RN. Evolutionary insights from profiling LINE-1 activity at allelic resolution in a single human genome. EMBO J 2024; 43:112-131. [PMID: 38177314 PMCID: PMC10883270 DOI: 10.1038/s44318-023-00007-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/18/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024] Open
Abstract
Transposable elements have created the majority of the sequence in many genomes. In mammals, LINE-1 retrotransposons have been expanding for more than 100 million years as distinct, consecutive lineages; however, the drivers of this recurrent lineage emergence and disappearance are unknown. Most human genome assemblies provide a record of this ancient evolution, but fail to resolve ongoing LINE-1 retrotranspositions. Utilizing the human CHM1 long-read-based haploid assembly, we identified and cloned all full-length, intact LINE-1s, and found 29 LINE-1s with measurable in vitro retrotransposition activity. Among individuals, these LINE-1s varied in their presence, their allelic sequences, and their activity. We found that recently retrotransposed LINE-1s tend to be active in vitro and polymorphic in the population relative to more ancient LINE-1s. However, some rare allelic forms of old LINE-1s retain activity, suggesting older lineages can persist longer than expected. Finally, in LINE-1s with in vitro activity and in vivo fitness, we identified mutations that may have increased replication in ancient genomes and may prove promising candidates for mechanistic investigations of the drivers of LINE-1 evolution and which LINE-1 sequences contribute to human disease.
Collapse
Affiliation(s)
- Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | - Ricky Padilla Del Valle
- Pacific Northwest Research Institute, Seattle, WA, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
| | | | - Richard N McLaughlin
- Pacific Northwest Research Institute, Seattle, WA, USA.
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA.
| |
Collapse
|
7
|
Volpe E, Corda L, Tommaso ED, Pelliccia F, Ottalevi R, Licastro D, Guarracino A, Capulli M, Formenti G, Tassone E, Giunta S. The complete diploid reference genome of RPE-1 identifies human phased epigenetic landscapes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.01.565049. [PMID: 38168337 PMCID: PMC10760208 DOI: 10.1101/2023.11.01.565049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here, we generated the complete diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable karyotype, to use as matched reference for multi-omics sequencing data analysis. Our RPE1v1.0 assembly presents completely phased haplotypes and chromosome-level scaffolds that span centromeres with ultra-high base accuracy (>QV60). We mapped the haplotype-specific genomic variation specific to this cell line including t(Xq;10q), a stable 73.18 Mb duplication of chromosome 10 translocated onto the microdeleted chromosome X telomere t(Xq;10q). Polymorphisms between haplotypes of the same genome reveals genetic and epigenetic variation for all chromosomes, especially at centromeres. The RPE-1 assembly as matched reference genome improves mapping quality of multi-omics reads originating from RPE-1 cells with drastic reduction in alignments mismatches compared to using the most complete human reference to date (CHM13). Leveraging the accuracy achieved using a matched reference, we were able to identify the kinetochore sites at base pair resolution and show unprecedented variation between haplotypes. This work showcases the use of matched reference genomes for multiomics analyses and serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for widespread application.
Collapse
Affiliation(s)
- Emilia Volpe
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Luca Corda
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Elena Di Tommaso
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Franca Pelliccia
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Riccardo Ottalevi
- Department of Bioinformatic, Dante Genomics Corp Inc., 667 Madison Avenue, New York, NY 10065 USA and S.s.17, 67100, L’Aquila, Italy
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Mattia Capulli
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy
| | - Giulio Formenti
- The Rockefeller University, 1230 York Avenue, 10065 New York, USA
| | - Evelyne Tassone
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Simona Giunta
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
8
|
He Y, Chu Y, Guo S, Hu J, Li R, Zheng Y, Ma X, Du Z, Zhao L, Yu W, Xue J, Bian W, Yang F, Chen X, Zhang P, Wu R, Ma Y, Shao C, Chen J, Wang J, Li J, Wu J, Hu X, Long Q, Jiang M, Ye H, Song S, Li G, Wei Y, Xu Y, Ma Y, Chen Y, Wang K, Bao J, Xi W, Wang F, Ni W, Zhang M, Yu Y, Li S, Kang Y, Gao Z. T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1085-1100. [PMID: 37595788 PMCID: PMC11082261 DOI: 10.1016/j.gpb.2023.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/01/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023]
Abstract
Since its initial release in 2001, the human reference genome has undergone continuous improvement in quality, and the recently released telomere-to-telomere (T2T) version - T2T-CHM13 - reaches its highest level of continuity and accuracy after 20 years of effort by working on a simplified, nearly homozygous genome of a hydatidiform mole cell line. Here, to provide an authentic complete diploid human genome reference for the Han Chinese, the largest population in the world, we assembled the genome of a male Han Chinese individual, T2T-YAO, which includes T2T assemblies of all the 22 + X + M and 22 + Y chromosomes in both haploids. The quality of T2T-YAO is much better than those of all currently available diploid assemblies, and its haploid version, T2T-YAO-hp, generated by selecting the better assembly for each autosome, reaches the top quality of fewer than one error per 29.5 Mb, even higher than that of T2T-CHM13. Derived from an individual living in the aboriginal region of the Han population, T2T-YAO shows clear ancestry and potential genetic continuity from the ancient ancestors. Each haplotype of T2T-YAO possesses ∼ 330-Mb exclusive sequences, ∼ 3100 unique genes, and tens of thousands of nucleotide and structural variations as compared with CHM13, highlighting the necessity of a population-stratified reference genome. The construction of T2T-YAO, an accurate and authentic representative of the Chinese population, would enable precise delineation of genomic variations and advance our understandings in the hereditability of diseases and phenotypes, especially within the context of the unique variations of the Chinese population.
Collapse
Affiliation(s)
- Yukun He
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China
| | - Yanan Chu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Shuming Guo
- Linfen Clinical Medicine Research Center, Linfen 041000, China; Institute of Chest and Lung Diseases, Shanxi Medical University, Taiyuan 030001, China
| | - Jiang Hu
- GrandOmics Biosciences Co., Ltd, Wuhan 430076, China
| | - Ran Li
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yali Zheng
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Xinqian Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Zhenglin Du
- Institute of PSI Genomics, Wenzhou 325024, China
| | - Lili Zhao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wenyi Yu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Jianbo Xue
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wenjie Bian
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Feifei Yang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Xi Chen
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Pingan Zhang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Rihan Wu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yifan Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Changjun Shao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jing Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jiwei Li
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Jing Wu
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Xiaoyi Hu
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Qiuyue Long
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Mingzheng Jiang
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Hongli Ye
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Shixu Song
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Guangyao Li
- Linfen Clinical Medicine Research Center, Linfen 041000, China
| | - Yue Wei
- Linfen Clinical Medicine Research Center, Linfen 041000, China
| | - Yu Xu
- Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China
| | - Yanliang Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yanwen Chen
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Keqiang Wang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Jing Bao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wen Xi
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Fang Wang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wentao Ni
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Moqin Zhang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yan Yu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Shengnan Li
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yu Kang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100490, China.
| | - Zhancheng Gao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Institute of Chest and Lung Diseases, Shanxi Medical University, Taiyuan 030001, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China.
| |
Collapse
|
9
|
Rangwala SH, Rudnev DV, Ananiev VV, Asztalos A, Benica B, Borodin EA, Bouk N, Evgeniev VI, Kodali VK, Lotov V, Mozes E, Oh DH, Omelchenko MV, Savkina S, Sukharnikov E, Virothaisakun J, Murphy TD, Pruitt KD, Schneider VA. Interactive visualization of whole eukaryote genome alignments using NCBI's Comparative Genome Viewer (CGV). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.30.564672. [PMID: 38077029 PMCID: PMC10705539 DOI: 10.1101/2023.10.30.564672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
We report a new visualization tool for analysis of whole genome assembly-assembly alignments, the Comparative Genome Viewer (CGV) (https://ncbi.nlm.nih.gov/genome/cgv/). CGV visualizes pairwise same-species and cross-species alignments provided by NCBI using assembly alignment algorithms developed by us and others. Researchers can examine the alignments between the two assemblies using two alternate views: a chromosome ideogram-based view or a 2D genome dotplot. Whole genome alignment views expose large structural differences spanning chromosomes, such as inversions or translocations. Users can also navigate to regions of interest, where they can detect and analyze smaller-scale deletions and rearrangements within specific chromosome or gene regions. RefSeq or user-provided gene annotation is displayed in the ideogram view where available. CGV currently provides approximately 700 alignments from over 300 animal, plant, and fungal species. CGV and related NCBI viewers are undergoing active development to further meet needs of the research community in comparative genome visualization.
Collapse
Affiliation(s)
- Sanjida H Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Dmitry V Rudnev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Victor V Ananiev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Andrea Asztalos
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Barrett Benica
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Evgeny A Borodin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Vladislav I Evgeniev
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Vamsi K Kodali
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Vadim Lotov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Eyal Mozes
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Dong-Ha Oh
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Marina V Omelchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Sofya Savkina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Ekaterina Sukharnikov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Joël Virothaisakun
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD 20894, USA
| |
Collapse
|
10
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Mao Y, Rautiainen M, Koren S, Nurk S, Porubsky D, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.30.542849. [PMID: 37398417 PMCID: PMC10312506 DOI: 10.1101/2023.05.30.542849] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N. Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K. Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A. Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
11
|
Ahmed OY, Rossi M, Gagie T, Boucher C, Langmead B. SPUMONI 2: improved classification using a pangenome index of minimizer digests. Genome Biol 2023; 24:122. [PMID: 37202771 PMCID: PMC10197461 DOI: 10.1186/s13059-023-02958-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 05/03/2023] [Indexed: 05/20/2023] Open
Abstract
Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2's index is 65 times smaller than minimap2's for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.
Collapse
Affiliation(s)
- Omar Y. Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| | - Massimiliano Rossi
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA
| | - Travis Gagie
- Faculty of Computer Science, Dalhousie University, Halifax, NS Canada
| | - Christina Boucher
- Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD USA
| |
Collapse
|
12
|
Weisburd B, Tiao G, Rehm HL. Insights from a genome-wide truth set of tandem repeat variation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.05.539588. [PMID: 37214979 PMCID: PMC10197592 DOI: 10.1101/2023.05.05.539588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Tools for genotyping tandem repeats (TRs) from short read sequencing data have improved significantly over the past decade. Extensive comparisons of these tools to gold standard diagnostic methods like RP-PCR have confirmed their accuracy for tens to hundreds of well-studied loci. However, a scarcity of high-quality orthogonal truth data limited our ability to measure tool accuracy for the millions of other loci throughout the genome. To address this, we developed a TR truth set based on the Synthetic Diploid Benchmark (SynDip). By identifying the subset of insertions and deletions that represent TR expansions or contractions with motifs between 2 and 50 base pairs, we obtained accurate genotypes for 139,795 pure and 6,845 interrupted repeats in a single diploid sample. Our approach did not require running existing genotyping tools on short read or long read sequencing data and provided an alternative, more accurate view of tandem repeat variation. We applied this truth set to compare the strengths and weaknesses of widely-used tools for genotyping TRs, evaluated the completeness of existing genome-wide TR catalogs, and explored the properties of tandem repeat variation throughout the genome. We found that, without filtering, ExpansionHunter had higher accuracy than GangSTR and HipSTR over a wide range of motifs and allele sizes. Also, when errors in allele size occurred, ExpansionHunter tended to overestimate expansion sizes, while GangSTR tended to underestimate them. Additionally, we saw that widely-used TR catalogs miss between 16% and 41% of variant loci in the truth set. These results suggest that genome-wide analyses would benefit from genotyping a larger set of loci as well as further tool development that builds on the strengths of current algorithms. To that end, we developed a new catalog of 2.8 million loci that captures 95% of variant loci in the truth set, and created a modified version of ExpansionHunter that runs 2 to 3x faster than the original while producing the same output.
Collapse
Affiliation(s)
- Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
13
|
Muilenburg KM, Isder CC, Radhakrishnan P, Batra SK, Ly QP, Carlson MA, Bouvet M, Hollingsworth MA, Mohs AM. Mucins as contrast agent targets for fluorescence-guided surgery of pancreatic cancer. Cancer Lett 2023; 561:216150. [PMID: 36997106 PMCID: PMC10150776 DOI: 10.1016/j.canlet.2023.216150] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 03/16/2023] [Accepted: 03/26/2023] [Indexed: 03/31/2023]
Abstract
Pancreatic cancer is difficult to resect due to its unique challenges, often leading to incomplete tumor resections. Fluorescence-guided surgery (FGS), also known as intraoperative molecular imaging and optical surgical navigation, is an intraoperative tool that can aid surgeons in complete tumor resection through an increased ability to detect the tumor. To target the tumor, FGS contrast agents rely on biomarkers aberrantly expressed in malignant tissue compared to normal tissue. These biomarkers allow clinicians to identify the tumor and its stage before surgical resection and provide a contrast agent target for intraoperative imaging. Mucins, a family of glycoproteins, are upregulated in malignant tissue compared to normal tissue. Therefore, these proteins may serve as biomarkers for surgical resection. Intraoperative imaging of mucin expression in pancreatic cancer can potentially increase the number of complete resections. While some mucins have been studied for FGS, the potential ability to function as a biomarker target extends to the entire mucin family. Therefore, mucins are attractive proteins to investigate more broadly as FGS biomarkers. This review summarizes the biomarker traits of mucins and their potential use in FGS for pancreatic cancer.
Collapse
Affiliation(s)
- Kathryn M Muilenburg
- Department of Pharmaceutical Sciences, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA.
| | - Carly C Isder
- Department of Pharmaceutical Sciences, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA.
| | - Prakash Radhakrishnan
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA.
| | - Surinder K Batra
- Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, S 45th St, Omaha, NE, 68198, USA.
| | - Quan P Ly
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Department of Surgery, University of Nebraska Medical Center, 983280 Nebraska Medical Center, Omaha, NE, 68198-3280, USA.
| | - Mark A Carlson
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Department of Surgery, University of Nebraska Medical Center, 983280 Nebraska Medical Center, Omaha, NE, 68198-3280, USA.
| | - Michael Bouvet
- Department of Surgery, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA; VA San Diego Healthcare System, 3350 La Jolla Village Dr, San Diego, CA, 92161, USA.
| | - Michael A Hollingsworth
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA.
| | - Aaron M Mohs
- Department of Pharmaceutical Sciences, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, 505 S 45th St, Omaha, NE, 68198, USA; Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, S 45th St, Omaha, NE, 68198, USA.
| |
Collapse
|
14
|
Rodriguez OL, Silver CA, Shields K, Smith ML, Watson CT. Targeted long-read sequencing facilitates phased diploid assembly and genotyping of the human T cell receptor alpha, delta, and beta loci. CELL GENOMICS 2022; 2:100228. [PMID: 36778049 PMCID: PMC9903726 DOI: 10.1016/j.xgen.2022.100228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 08/25/2022] [Accepted: 11/05/2022] [Indexed: 12/02/2022]
Abstract
T cell receptors (TCRs) recognize peptide fragments presented by the major histocompatibility complex (MHC) and are critical to T cell-mediated immunity. Recent data have indicated that genetic diversity within TCR-encoding gene regions is underexplored, limiting understanding of the impact of TCR loci polymorphisms on TCR function in disease, even though TCR repertoire signatures (1) are heritable and (2) associate with disease phenotypes. To address this, we developed a targeted long-read sequencing approach to generate highly accurate haplotype resolved assemblies of the TCR beta (TRB) and alpha/delta (TRA/D) loci, facilitating the genotyping of all variant types, including structural variants. We validate our approach using two mother-father-child trios and 5 unrelated donors representing multiple populations. This resulted in improved genotyping accuracy and the discovery of 84 undocumented V, D, J, and C alleles, demonstrating the utility of this framework for improving our understanding of TCR diversity and function in disease.
Collapse
Affiliation(s)
- Oscar L. Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Catherine A. Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Kaitlyn Shields
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Melissa L. Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA,Corresponding author
| |
Collapse
|
15
|
Abstract
Centromeres are key elements for chromosome segregation. Canonical centromeres are built over long-stretches of tandem repetitive arrays. Despite being quite abundant compared to other loci, centromere sequences overall still represent only 2 to 5% of the human genome, therefore studying their genetic and epigenetic features is a major challenge. Furthermore, sequencing of centromeric regions requires high coverage to fully analyze length and sequence variations, and this can be extremely costly. To bypass these issues, we have developed a technique, named CenRICH, to enrich for centromeric DNA from human cells based on selective restriction digestion and size fractionation. Combining restriction enzymes cutting at high frequency throughout the genome, except within most human centromeres, with size-selection of fragments >20 kb, resulted in over 25-fold enrichment in centromeric DNA. High-throughput sequencing revealed that up to 60% of the DNA in the enriched samples is made of centromeric repeats. We show that this method can be used in combination with long-read sequencing to investigate the DNA methylation status of certain centromeres and, with a specific enzyme combination, also of their surrounding regions (mainly HSATII). Finally, we show that CenRICH facilitates single-molecule analysis of replicating centromeric fibers by DNA combing. This approach has great potential for making sequencing of centromeric DNA more affordable and efficient and for single DNA molecule studies.
Collapse
|
16
|
Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, Taylor DJ, Shafin K, Shumate A, Xiao C, Wagner J, McDaniel J, Olson ND, Sauria MEG, Vollger MR, Rhie A, Meredith M, Martin S, Lee J, Koren S, Rosenfeld JA, Paten B, Layer R, Chin CS, Sedlazeck FJ, Hansen NF, Miller DE, Phillippy AM, Miga KH, McCoy RC, Dennis MY, Zook JM, Schatz MC. A complete reference genome improves analysis of human genetic variation. Science 2022; 376:eabl3533. [PMID: 35357935 DOI: 10.1126/science.abl3533] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie M Yan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel Avdeyev
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Justin Wagner
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Melissa Meredith
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Skylar Martin
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | - Sergey Koren
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ryan Layer
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, National Human Genome Research Institute, Rockville, MD, USA
| | - Danny E Miller
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.,Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biology, Johns Hopkins University, Baltimore, MD, USA.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
17
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 976] [Impact Index Per Article: 488.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
18
|
Antonarakis SE. Short arms of human acrocentric chromosomes and the completion of the human genome sequence. Genome Res 2022; 32:599-607. [PMID: 35361624 PMCID: PMC8997349 DOI: 10.1101/gr.275350.121] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The complete, ungapped sequence of the short arms of human acrocentric chromosomes (SAACs) is still unknown almost 20 years after the near completion of the Human Genome Project. Yet these short arms of Chromosomes 13, 14, 15, 21, and 22 contain the ribosomal DNA (rDNA) genes, which are of paramount importance for human biology. The sequences of SAACs show an extensive variation in the copy number of the various repetitive elements, the full extent of which is currently unknown. In addition, the full spectrum of repeated sequences, their organization, and the low copy number functional elements are also unknown. The Telomere-to-Telomere (T2T) Project using mainly long-read sequence technology has recently completed the assembly of the genome from a hydatidiform mole, CHM13, and has thus established a baseline reference for further studies on the organization, variation, functional annotation, and impact in human disorders of all the previously unknown genomic segments, including the SAACs. The publication of the initial results of the T2T Project will update and improve the reference genome for a better understanding of the evolution and function of the human genome.
Collapse
Affiliation(s)
- Stylianos E Antonarakis
- Department of Genetic Medicine and Development, University of Geneva Medical Faculty, 1211 Geneva, Switzerland.,Foundation Campus Biotech, 1202 Geneva, Switzerland.,Medigenome, Swiss Institute of Genomic Medicine, 1207 Geneva, Switzerland
| |
Collapse
|
19
|
Using induced pluripotent stem cells to investigate human neuronal phenotypes in 1q21.1 deletion and duplication syndrome. Mol Psychiatry 2022; 27:819-830. [PMID: 34112971 PMCID: PMC9054650 DOI: 10.1038/s41380-021-01182-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 05/17/2021] [Accepted: 05/27/2021] [Indexed: 01/08/2023]
Abstract
Copy Number Variation (CNV) at the 1q21.1 locus is associated with a range of neurodevelopmental and psychiatric disorders in humans, including abnormalities in head size and motor deficits. Yet, the functional consequences of these CNVs (both deletion and duplication) on neuronal development remain unknown. To determine the impact of CNV at the 1q21.1 locus on neuronal development, we generated induced pluripotent stem cells from individuals harbouring 1q21.1 deletion or duplication and differentiated them into functional cortical neurons. We show that neurons with 1q21.1 deletion or duplication display reciprocal phenotype with respect to proliferation, differentiation potential, neuronal maturation, synaptic density and functional activity. Deletion of the 1q21.1 locus was also associated with an increased expression of lower cortical layer markers. This difference was conserved in the mouse model of 1q21.1 deletion, which displayed altered corticogenesis. Importantly, we show that neurons with 1q21.1 deletion and duplication are associated with differential expression of calcium channels and demonstrate that physiological deficits in neurons with 1q21.1 deletion or duplication can be pharmacologically modulated by targeting Ca2+ channel activity. These findings provide biological insight into the neuropathological mechanism underlying 1q21.1 associated brain disorder and indicate a potential target for therapeutic interventions.
Collapse
|
20
|
Sønderby IE, Ching CRK, Thomopoulos SI, van der Meer D, Sun D, Villalon‐Reina JE, Agartz I, Amunts K, Arango C, Armstrong NJ, Ayesa‐Arriola R, Bakker G, Bassett AS, Boomsma DI, Bülow R, Butcher NJ, Calhoun VD, Caspers S, Chow EWC, Cichon S, Ciufolini S, Craig MC, Crespo‐Facorro B, Cunningham AC, Dale AM, Dazzan P, de Zubicaray GI, Djurovic S, Doherty JL, Donohoe G, Draganski B, Durdle CA, Ehrlich S, Emanuel BS, Espeseth T, Fisher SE, Ge T, Glahn DC, Grabe HJ, Gur RE, Gutman BA, Haavik J, Håberg AK, Hansen LA, Hashimoto R, Hibar DP, Holmes AJ, Hottenga J, Hulshoff Pol HE, Jalbrzikowski M, Knowles EEM, Kushan L, Linden DEJ, Liu J, Lundervold AJ, Martin‐Brevet S, Martínez K, Mather KA, Mathias SR, McDonald‐McGinn DM, McRae AF, Medland SE, Moberget T, Modenato C, Monereo Sánchez J, Moreau CA, Mühleisen TW, Paus T, Pausova Z, Prieto C, Ragothaman A, Reinbold CS, Reis Marques T, Repetto GM, Reymond A, Roalf DR, Rodriguez‐Herreros B, Rucker JJ, Sachdev PS, Schmitt JE, Schofield PR, Silva AI, Stefansson H, Stein DJ, Tamnes CK, Tordesillas‐Gutiérrez D, Ulfarsson MO, Vajdi A, van 't Ent D, van den Bree MBM, Vassos E, Vázquez‐Bourgon J, Vila‐Rodriguez F, Walters GB, Wen W, Westlye LT, Wittfeld K, Zackai EH, Stefánsson K, Jacquemont S, Thompson PM, Bearden CE, Andreassen OA. Effects of copy number variations on brain structure and risk for psychiatric illness: Large-scale studies from the ENIGMA working groups on CNVs. Hum Brain Mapp 2022; 43:300-328. [PMID: 33615640 PMCID: PMC8675420 DOI: 10.1002/hbm.25354] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 01/07/2021] [Accepted: 01/13/2021] [Indexed: 01/21/2023] Open
Abstract
The Enhancing NeuroImaging Genetics through Meta-Analysis copy number variant (ENIGMA-CNV) and 22q11.2 Deletion Syndrome Working Groups (22q-ENIGMA WGs) were created to gain insight into the involvement of genetic factors in human brain development and related cognitive, psychiatric and behavioral manifestations. To that end, the ENIGMA-CNV WG has collated CNV and magnetic resonance imaging (MRI) data from ~49,000 individuals across 38 global research sites, yielding one of the largest studies to date on the effects of CNVs on brain structures in the general population. The 22q-ENIGMA WG includes 12 international research centers that assessed over 533 individuals with a confirmed 22q11.2 deletion syndrome, 40 with 22q11.2 duplications, and 333 typically developing controls, creating the largest-ever 22q11.2 CNV neuroimaging data set. In this review, we outline the ENIGMA infrastructure and procedures for multi-site analysis of CNVs and MRI data. So far, ENIGMA has identified effects of the 22q11.2, 16p11.2 distal, 15q11.2, and 1q21.1 distal CNVs on subcortical and cortical brain structures. Each CNV is associated with differences in cognitive, neurodevelopmental and neuropsychiatric traits, with characteristic patterns of brain structural abnormalities. Evidence of gene-dosage effects on distinct brain regions also emerged, providing further insight into genotype-phenotype relationships. Taken together, these results offer a more comprehensive picture of molecular mechanisms involved in typical and atypical brain development. This "genotype-first" approach also contributes to our understanding of the etiopathogenesis of brain disorders. Finally, we outline future directions to better understand effects of CNVs on brain structure and behavior.
Collapse
Affiliation(s)
- Ida E. Sønderby
- Department of Medical GeneticsOslo University HospitalOsloNorway
- Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and AddictionOslo University Hospital and University of OsloOsloNorway
- KG Jebsen Centre for Neurodevelopmental DisordersUniversity of OsloOsloNorway
| | - Christopher R. K. Ching
- Imaging Genetics CenterMark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern CaliforniaMarina del ReyCaliforniaUSA
| | - Sophia I. Thomopoulos
- Imaging Genetics CenterMark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern CaliforniaMarina del ReyCaliforniaUSA
| | - Dennis van der Meer
- Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and AddictionOslo University Hospital and University of OsloOsloNorway
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life SciencesMaastricht UniversityMaastrichtThe Netherlands
| | - Daqiang Sun
- Semel Institute for Neuroscience and Human Behavior, Departments of Psychiatry and Biobehavioral Sciences and PsychologyUniversity of California Los AngelesLos AngelesCaliforniaUSA
- Department of Mental HealthVeterans Affairs Greater Los Angeles Healthcare System, Los AngelesCaliforniaUSA
| | - Julio E. Villalon‐Reina
- Imaging Genetics CenterMark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern CaliforniaMarina del ReyCaliforniaUSA
| | - Ingrid Agartz
- NORMENT, Institute of Clinical PsychiatryUniversity of OsloOsloNorway
- Department of Psychiatric ResearchDiakonhjemmet HospitalOsloNorway
- Department of Clinical NeuroscienceKarolinska InstitutetStockholmSweden
| | - Katrin Amunts
- Institute of Neuroscience and Medicine (INM‐1)Research Centre JülichJülichGermany
- Cecile and Oskar Vogt Institute for Brain Research, Medical FacultyUniversity Hospital Düsseldorf, Heinrich‐Heine‐University DüsseldorfDüsseldorfGermany
| | - Celso Arango
- Department of Child and Adolescent PsychiatryInstitute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañon, IsSGM, Universidad Complutense, School of MedicineMadridSpain
- Centro Investigación Biomédica en Red de Salud Mental (CIBERSAM)MadridSpain
| | | | - Rosa Ayesa‐Arriola
- Centro Investigación Biomédica en Red de Salud Mental (CIBERSAM)MadridSpain
- Department of PsychiatryMarqués de Valdecilla University Hospital, Valdecilla Biomedical Research Institute (IDIVAL)SantanderSpain
| | - Geor Bakker
- Department of Psychiatry and NeuropsychologyMaastricht UniversityMaastrichtThe Netherlands
- Department of Radiology and Nuclear MedicineVU University Medical CenterAmsterdamThe Netherlands
| | - Anne S. Bassett
- Clinical Genetics Research ProgramCentre for Addiction and Mental HealthTorontoOntarioCanada
- Dalglish Family 22q Clinic for Adults with 22q11.2 Deletion Syndrome, Toronto General HospitalUniversity Health NetworkTorontoOntarioCanada
- Department of PsychiatryUniversity of TorontoTorontoOntarioCanada
| | - Dorret I. Boomsma
- Department of Biological PsychologyVrije Universiteit AmsterdamAmsterdamThe Netherlands
- Amsterdam Public Health (APH) Research InstituteAmsterdam UMCAmsterdamThe Netherlands
| | - Robin Bülow
- Institute of Diagnostic Radiology and NeuroradiologyUniversity Medicine GreifswaldGreifswaldGermany
| | - Nancy J. Butcher
- Department of PsychiatryUniversity of TorontoTorontoOntarioCanada
- Child Health Evaluative SciencesThe Hospital for Sick Children Research InstituteTorontoOntarioCanada
| | - Vince D. Calhoun
- Tri‐institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)Georgia State, Georgia Tech, EmoryAtlantaGeorgiaUSA
| | - Svenja Caspers
- Institute of Neuroscience and Medicine (INM‐1)Research Centre JülichJülichGermany
- Institute for Anatomy IMedical Faculty & University Hospital Düsseldorf, University of DüsseldorfDüsseldorfGermany
| | - Eva W. C. Chow
- Clinical Genetics Research ProgramCentre for Addiction and Mental HealthTorontoOntarioCanada
- Department of PsychiatryUniversity of TorontoTorontoOntarioCanada
| | - Sven Cichon
- Institute of Neuroscience and Medicine (INM‐1)Research Centre JülichJülichGermany
- Institute of Medical Genetics and PathologyUniversity Hospital BaselBaselSwitzerland
- Department of BiomedicineUniversity of BaselBaselSwitzerland
| | - Simone Ciufolini
- Department of Psychosis StudiesInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUnited Kingdom
| | - Michael C. Craig
- Department of Forensic and Neurodevelopmental SciencesThe Sackler Institute for Translational Neurodevelopmental Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's CollegeLondonUnited Kingdom
| | | | - Adam C. Cunningham
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Anders M. Dale
- Center for Multimodal Imaging and GeneticsUniversity of California San DiegoLa JollaCaliforniaUSA
- Department RadiologyUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Paola Dazzan
- Department of Psychological MedicineInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUnited Kingdom
| | - Greig I. de Zubicaray
- Faculty of HealthQueensland University of Technology (QUT)BrisbaneQueenslandAustralia
| | - Srdjan Djurovic
- Department of Medical GeneticsOslo University HospitalOsloNorway
- NORMENT, Department of Clinical ScienceUniversity of BergenBergenNorway
| | - Joanne L. Doherty
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
- Cardiff University Brain Research Imaging Centre (CUBRIC)CardiffUnited Kingdom
| | - Gary Donohoe
- Center for Neuroimaging, Genetics and GenomicsSchool of Psychology, NUI GalwayGalwayIreland
| | - Bogdan Draganski
- LREN, Centre for Research in Neuroscience, Department of NeuroscienceUniversity Hospital Lausanne and University LausanneLausanneSwitzerland
- Neurology DepartmentMax‐Planck Institute for Human Brain and Cognitive SciencesLeipzigGermany
| | - Courtney A. Durdle
- MIND Institute and Department of Psychiatry and Behavioral SciencesUniversity of California DavisDavisCaliforniaUSA
| | - Stefan Ehrlich
- Division of Psychological and Social Medicine and Developmental NeurosciencesFaculty of Medicine, TU DresdenDresdenGermany
| | - Beverly S. Emanuel
- Department of PediatricsPerelman School of Medicine at the University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Thomas Espeseth
- Department of PsychologyUniversity of OsloOsloNorway
- Department of PsychologyBjørknes CollegeOsloNorway
| | - Simon E. Fisher
- Language and Genetics DepartmentMax Planck Institute for PsycholinguisticsNijmegenThe Netherlands
- Donders Institute for Brain, Cognition and BehaviourRadboud UniversityNijmegenThe Netherlands
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics UnitCenter for Genomic Medicine, Massachusetts General HospitalBostonMassachusettsUSA
- Department of Psychiatry, Massachusetts General HospitalHarvard Medical SchoolBostonMassachusettsUSA
| | - David C. Glahn
- Tommy Fuss Center for Neuropsychiatric Disease ResearchBoston Children's HospitalBostonMassachusettsUSA
- Department of PsychiatryHarvard Medical SchoolBostonMassachusettsUSA
| | - Hans J. Grabe
- German Center for Neurodegenerative Diseases (DZNE)Site Rostock/GreifswaldGreifswaldGermany
- Department of Psychiatry and PsychotherapyUniversity Medicine GreifswaldGreifswaldGermany
| | - Raquel E. Gur
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Youth Suicide Prevention, Intervention and Research CenterChildren's Hospital of PhiladelphiaPhiladelphiaPennsylvaniaUSA
| | - Boris A. Gutman
- Medical Imaging Research Center, Department of Biomedical EngineeringIllinois Institute of TechnologyChicagoIllinoisUSA
| | - Jan Haavik
- Department of BiomedicineUniversity of BergenBergenNorway
- Division of PsychiatryHaukeland University HospitalBergenNorway
| | - Asta K. Håberg
- Department of Neuromedicine and Movement Science, Faculty of Medicine and Health SciencesNorwegian University of Science and TechnologyTrondheimNorway
- Department of Radiology and Nuclear MedicineSt. Olavs HospitalTrondheimNorway
| | - Laura A. Hansen
- Department of Psychiatry and Biobehavioral SciencesUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Ryota Hashimoto
- Department of Pathology of Mental DiseasesNational Institute of Mental Health, National Center of Neurology and PsychiatryTokyoJapan
- Department of PsychiatryOsaka University Graduate School of MedicineOsakaJapan
| | - Derrek P. Hibar
- Personalized Healthcare AnalyticsGenentech, Inc.South San FranciscoCaliforniaUSA
| | - Avram J. Holmes
- Department of PsychologyYale UniversityNew HavenConnecticutUSA
- Department of PsychiatryYale UniversityNew HavenConnecticutUSA
| | - Jouke‐Jan Hottenga
- Department of Biological PsychologyVrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Hilleke E. Hulshoff Pol
- Department of Psychiatry, UMC Utrecht Brain Center, University Medical Center UtrechtUtrecht UniversityUtrechtThe Netherlands
| | | | - Emma E. M. Knowles
- Department of Psychiatry, Massachusetts General HospitalHarvard Medical SchoolBostonMassachusettsUSA
- Department of PsychiatryBoston Children's HospitalBostonMassachusettsUSA
| | - Leila Kushan
- Semel Institute for Neuroscience and Human BehaviorUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - David E. J. Linden
- School for Mental Health and NeuroscienceMaastricht UniversityMaastrichtThe Netherlands
- Neuroscience and Mental Health Research InstituteCardiff UniversityCardiffUnited Kingdom
| | - Jingyu Liu
- Tri‐institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS)Georgia State, Georgia Tech, EmoryAtlantaGeorgiaUSA
- Computer ScienceGeorgia State UniversityAtlantaGeorgiaUSA
| | - Astri J. Lundervold
- Department of Biological and Medical PsychologyUniversity of BergenBergenNorway
| | - Sandra Martin‐Brevet
- LREN, Centre for Research in Neuroscience, Department of NeuroscienceUniversity Hospital Lausanne and University LausanneLausanneSwitzerland
| | - Kenia Martínez
- Department of Child and Adolescent PsychiatryInstitute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañon, IsSGM, Universidad Complutense, School of MedicineMadridSpain
- Centro Investigación Biomédica en Red de Salud Mental (CIBERSAM)MadridSpain
- Facultad de PsicologíaUniversidad Autónoma de MadridMadridSpain
| | - Karen A. Mather
- Centre for Healthy Brain Ageing (CHeBA), School of Psychiatry, Faculty of MedicineUniversity of New South WalesSydneyNew South WalesAustralia
- Neuroscience Research AustraliaSydneyNew South WalesAustralia
| | - Samuel R. Mathias
- Department of PsychiatryHarvard Medical SchoolBostonMassachusettsUSA
- Department of PsychiatryBoston Children's HospitalBostonMassachusettsUSA
| | - Donna M. McDonald‐McGinn
- Department of PediatricsPerelman School of Medicine at the University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Division of Human GeneticsChildren's Hospital of PhiladelphiaPhiladelphiaPennsylvaniaUSA
- Division of Human Genetics and 22q and You CenterChildren's Hospital of PhiladelphiaPhiladelphiaPennsylvaniaUSA
| | - Allan F. McRae
- Institute for Molecular BioscienceThe University of QueenslandBrisbaneQueenslandAustralia
| | - Sarah E. Medland
- Psychiatric GeneticsQIMR Berghofer Medical Research InstituteBrisbaneQueenslandAustralia
| | - Torgeir Moberget
- Department of Psychology, Faculty of Social SciencesUniversity of OsloOsloNorway
| | - Claudia Modenato
- LREN, Centre for Research in Neuroscience, Department of NeuroscienceUniversity Hospital Lausanne and University LausanneLausanneSwitzerland
- University of LausanneLausanneSwitzerland
| | - Jennifer Monereo Sánchez
- School for Mental Health and NeuroscienceMaastricht UniversityMaastrichtThe Netherlands
- Faculty of Health, Medicine and Life SciencesMaastricht UniversityMaastrichtThe Netherlands
- Department of Radiology and Nuclear MedicineMaastricht University Medical CenterMaastrichtThe Netherlands
| | - Clara A. Moreau
- Sainte Justine Hospital Research CenterUniversity of Montreal, MontrealQCCanada
| | - Thomas W. Mühleisen
- Institute of Neuroscience and Medicine (INM‐1)Research Centre JülichJülichGermany
- Cecile and Oskar Vogt Institute for Brain Research, Medical FacultyUniversity Hospital Düsseldorf, Heinrich‐Heine‐University DüsseldorfDüsseldorfGermany
- Department of BiomedicineUniversity of BaselBaselSwitzerland
| | - Tomas Paus
- Bloorview Research InstituteHolland Bloorview Kids Rehabilitation HospitalTorontoOntarioCanada
- Departments of Psychology and PsychiatryUniversity of TorontoTorontoOntarioCanada
| | - Zdenka Pausova
- Translational Medicine, The Hospital for Sick ChildrenTorontoOntarioCanada
| | - Carlos Prieto
- Bioinformatics Service, NucleusUniversity of SalamancaSalamancaSpain
| | | | - Céline S. Reinbold
- Department of BiomedicineUniversity of BaselBaselSwitzerland
- Centre for Lifespan Changes in Brain and Cognition, Department of PsychologyUniversity of OsloOsloNorway
| | - Tiago Reis Marques
- Department of Psychosis StudiesInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUnited Kingdom
- Psychiatric Imaging Group, MRC London Institute of Medical Sciences (LMS), Hammersmith HospitalImperial College LondonLondonUnited Kingdom
| | - Gabriela M. Repetto
- Center for Genetics and GenomicsFacultad de Medicina, Clinica Alemana Universidad del DesarrolloSantiagoChile
| | - Alexandre Reymond
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
| | - David R. Roalf
- Department of PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | | | - James J. Rucker
- Department of Psychological MedicineInstitute of Psychiatry, Psychology and Neuroscience, King's College LondonLondonUnited Kingdom
| | - Perminder S. Sachdev
- Centre for Healthy Brain Ageing (CHeBA), School of Psychiatry, Faculty of MedicineUniversity of New South WalesSydneyNew South WalesAustralia
- Neuropsychiatric InstituteThe Prince of Wales HospitalSydneyNew South WalesAustralia
| | - James E. Schmitt
- Department of Radiology and PsychiatryUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Peter R. Schofield
- Neuroscience Research AustraliaSydneyNew South WalesAustralia
- School of Medical SciencesUNSW SydneySydneyNew South WalesAustralia
| | - Ana I. Silva
- Neuroscience and Mental Health Research InstituteCardiff UniversityCardiffUnited Kingdom
- School for Mental Health and Neuroscience, Department of Psychiatry and Neuropsychology, Faculty of Health, Medicine and Life SciencesMaastricht UniversityMaastrichtThe Netherlands
| | | | - Dan J. Stein
- SA MRC Unit on Risk & Resilience in Mental Disorders, Department of Psychiatry and Neuroscience InstituteUniversity of Cape TownCape TownSouth Africa
| | - Christian K. Tamnes
- Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and AddictionOslo University Hospital and University of OsloOsloNorway
- Department of Psychiatric ResearchDiakonhjemmet HospitalOsloNorway
- PROMENTA Research Center, Department of PsychologyUniversity of OsloOsloNorway
| | - Diana Tordesillas‐Gutiérrez
- Centro Investigación Biomédica en Red de Salud Mental (CIBERSAM)MadridSpain
- Neuroimaging Unit, Technological FacilitiesValdecilla Biomedical Research Institute (IDIVAL), SantanderSpain
| | - Magnus O. Ulfarsson
- Population Genomics, deCODE genetics/AmgenReykjavikIceland
- Faculty of Electrical and Computer EngineeringUniversity of Iceland, ReykjavikIceland
| | - Ariana Vajdi
- Semel Institute for Neuroscience and Human BehaviorUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Dennis van 't Ent
- Department of Biological PsychologyVrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Marianne B. M. van den Bree
- MRC Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical NeurosciencesCardiff UniversityCardiffUnited Kingdom
| | - Evangelos Vassos
- Social, Genetic and Developmental Psychiatry CentreInstitute of Psychiatry, Psychology & Neuroscience, King's College LondonLondonUnited Kingdom
| | - Javier Vázquez‐Bourgon
- Centro Investigación Biomédica en Red de Salud Mental (CIBERSAM)MadridSpain
- Department of PsychiatryMarqués de Valdecilla University Hospital, Valdecilla Biomedical Research Institute (IDIVAL)SantanderSpain
- School of MedicineUniversity of CantabriaSantanderSpain
| | - Fidel Vila‐Rodriguez
- Department of PsychiatryThe University of British ColumbiaVancouverBritish ColumbiaCanada
| | - G. Bragi Walters
- Population Genomics, deCODE genetics/AmgenReykjavikIceland
- Faculty of MedicineUniversity of IcelandReykjavikIceland
| | - Wei Wen
- Centre for Healthy Brain Ageing (CHeBA), School of Psychiatry, Faculty of MedicineUniversity of New South WalesSydneyNew South WalesAustralia
| | - Lars T. Westlye
- KG Jebsen Centre for Neurodevelopmental DisordersUniversity of OsloOsloNorway
- Department of PsychologyUniversity of OsloOsloNorway
- NORMENT, Division of Mental Health and AddictionOslo University HospitalOsloNorway
| | - Katharina Wittfeld
- German Center for Neurodegenerative Diseases (DZNE)Site Rostock/GreifswaldGreifswaldGermany
- Department of Psychiatry and PsychotherapyUniversity Medicine GreifswaldGreifswaldGermany
| | - Elaine H. Zackai
- Department of PediatricsPerelman School of Medicine at the University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
- Division of Human GeneticsChildren's Hospital of PhiladelphiaPhiladelphiaPennsylvaniaUSA
| | - Kári Stefánsson
- Population Genomics, deCODE genetics/AmgenReykjavikIceland
- Faculty of MedicineUniversity of IcelandReykjavikIceland
| | - Sebastien Jacquemont
- Sainte Justine Hospital Research CenterUniversity of Montreal, MontrealQCCanada
- Department of PediatricsUniversity of Montreal, MontrealQCCanada
| | - Paul M. Thompson
- Imaging Genetics CenterMark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern CaliforniaMarina del ReyCaliforniaUSA
| | - Carrie E. Bearden
- Semel Institute for Neuroscience and Human Behavior, Departments of Psychiatry and Biobehavioral Sciences and PsychologyUniversity of California Los AngelesLos AngelesCaliforniaUSA
- Center for Neurobehavioral GeneticsUniversity of California Los AngelesLos AngelesCaliforniaUSA
| | - Ole A. Andreassen
- Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and AddictionOslo University Hospital and University of OsloOsloNorway
| |
Collapse
|
21
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
22
|
Mostovoy Y, Yilmaz F, Chow SK, Chu C, Lin C, Geiger EA, Meeks NJL, Chatfield KC, Coughlin CR, Surti U, Kwok PY, Shaikh TH. Genomic regions associated with microdeletion/microduplication syndromes exhibit extreme diversity of structural variation. Genetics 2021; 217:6066166. [PMID: 33724415 DOI: 10.1093/genetics/iyaa038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 12/18/2020] [Indexed: 11/12/2022] Open
Abstract
Segmental duplications (SDs) are a class of long, repetitive DNA elements whose paralogs share a high level of sequence similarity with each other. SDs mediate chromosomal rearrangements that lead to structural variation in the general population as well as genomic disorders associated with multiple congenital anomalies, including the 7q11.23 (Williams-Beuren Syndrome, WBS), 15q13.3, and 16p12.2 microdeletion syndromes. Population-level characterization of SDs has generally been lacking because most techniques used for analyzing these complex regions are both labor and cost intensive. In this study, we have used a high-throughput technique to genotype complex structural variation with a single molecule, long-range optical mapping approach. We characterized SDs and identified novel structural variants (SVs) at 7q11.23, 15q13.3, and 16p12.2 using optical mapping data from 154 phenotypically normal individuals from 26 populations comprising five super-populations. We detected several novel SVs for each locus, some of which had significantly different prevalence between populations. Additionally, we localized the microdeletion breakpoints to specific paralogous duplicons located within complex SDs in two patients with WBS, one patient with 15q13.3, and one patient with 16p12.2 microdeletion syndromes. The population-level data presented here highlights the extreme diversity of large and complex SVs within SD-containing regions. The approach we outline will greatly facilitate the investigation of the role of inter-SD structural variation as a driver of chromosomal rearrangements and genomic disorders.
Collapse
Affiliation(s)
- Yulia Mostovoy
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Feyza Yilmaz
- Department of Integrative Biology, University of Colorado Denver, Denver, CO 80204, USA.,Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Stephen K Chow
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Catherine Chu
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Chin Lin
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Elizabeth A Geiger
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Naomi J L Meeks
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Kathryn C Chatfield
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA.,Department of Pediatrics, Section of Cardiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Curtis R Coughlin
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Pui-Yan Kwok
- Cardiovascular Research Institute, UCSF School of Medicine, San Francisco, CA 94143, USA.,Department of Dermatology, UCSF School of Medicine, San Francisco, CA 94143, USA.,Institute for Human Genetics, UCSF School of Medicine, San Francisco, CA 94143, USA
| | - Tamim H Shaikh
- Department of Pediatrics, Section of Clinical Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
23
|
Li H, Dawood M, Khayat MM, Farek JR, Jhangiani SN, Khan ZM, Mitani T, Coban-Akdemir Z, Lupski JR, Venner E, Posey JE, Sabo A, Gibbs RA. Exome variant discrepancies due to reference-genome differences. Am J Hum Genet 2021; 108:1239-1250. [PMID: 34129815 PMCID: PMC8322936 DOI: 10.1016/j.ajhg.2021.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/19/2021] [Indexed: 12/15/2022] Open
Abstract
Despite release of the GRCh38 human reference genome more than seven years ago, GRCh37 remains more widely used by most research and clinical laboratories. To date, no study has quantified the impact of utilizing different reference assemblies for the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. By calling variants on both the GRCh37 and GRCh38 references, we identified single-nucleotide variants (SNVs) and insertion-deletions (indels) in 1,572 exomes from participants with Mendelian diseases and their family members. We found that a total of 1.5% of SNVs and 2.0% of indels were discordant when different references were used. Notably, 76.6% of the discordant variants were clustered within discrete discordant reference patches (DISCREPs) comprising only 0.9% of loci targeted by exome sequencing. These DISCREPs were enriched for genomic elements including segmental duplications, fix patch sequences, and loci known to contain alternate haplotypes. We identified 206 genes significantly enriched for discordant variants, most of which were in DISCREPs and caused by multi-mapped reads on the reference assembly that lacked the variant call. Among these 206 genes, eight are implicated in known Mendelian diseases and 53 are associated with common phenotypes from genome-wide association studies. In addition, variant interpretations could also be influenced by the reference after lifting-over variant loci to another assembly. Overall, we identified genes and genomic loci affected by reference assembly choice, including genes associated with Mendelian disorders and complex human diseases that require careful evaluation in both research and clinical applications.
Collapse
Affiliation(s)
- He Li
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moez Dawood
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jesse R Farek
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ziad M Khan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tadahiro Mitani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zeynep Coban-Akdemir
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Texas Children's Hospital, Houston, TX 77030, USA
| | - Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
24
|
Sønderby IE, van der Meer D, Moreau C, Kaufmann T, Walters GB, Ellegaard M, Abdellaoui A, Ames D, Amunts K, Andersson M, Armstrong NJ, Bernard M, Blackburn NB, Blangero J, Boomsma DI, Brodaty H, Brouwer RM, Bülow R, Bøen R, Cahn W, Calhoun VD, Caspers S, Ching CRK, Cichon S, Ciufolini S, Crespo-Facorro B, Curran JE, Dale AM, Dalvie S, Dazzan P, de Geus EJC, de Zubicaray GI, de Zwarte SMC, Desrivieres S, Doherty JL, Donohoe G, Draganski B, Ehrlich S, Eising E, Espeseth T, Fejgin K, Fisher SE, Fladby T, Frei O, Frouin V, Fukunaga M, Gareau T, Ge T, Glahn DC, Grabe HJ, Groenewold NA, Gústafsson Ó, Haavik J, Haberg AK, Hall J, Hashimoto R, Hehir-Kwa JY, Hibar DP, Hillegers MHJ, Hoffmann P, Holleran L, Holmes AJ, Homuth G, Hottenga JJ, Hulshoff Pol HE, Ikeda M, Jahanshad N, Jockwitz C, Johansson S, Jönsson EG, Jørgensen NR, Kikuchi M, Knowles EEM, Kumar K, Le Hellard S, Leu C, Linden DEJ, Liu J, Lundervold A, Lundervold AJ, Maillard AM, Martin NG, Martin-Brevet S, Mather KA, Mathias SR, McMahon KL, McRae AF, Medland SE, Meyer-Lindenberg A, Moberget T, Modenato C, Sánchez JM, Morris DW, Mühleisen TW, Murray RM, Nielsen J, Nordvik JE, Nyberg L, Loohuis LMO, Ophoff RA, Owen MJ, Paus T, Pausova Z, Peralta JM, Pike GB, Prieto C, Quinlan EB, Reinbold CS, Marques TR, Rucker JJH, Sachdev PS, Sando SB, Schofield PR, Schork AJ, Schumann G, Shin J, Shumskaya E, Silva AI, Sisodiya SM, Steen VM, Stein DJ, Strike LT, Suzuki IK, Tamnes CK, Teumer A, Thalamuthu A, Tordesillas-Gutiérrez D, Uhlmann A, Ulfarsson MO, van 't Ent D, van den Bree MBM, Vanderhaeghen P, Vassos E, Wen W, Wittfeld K, Wright MJ, Agartz I, Djurovic S, Westlye LT, Stefansson H, Stefansson K, Jacquemont S, Thompson PM, Andreassen OA. 1q21.1 distal copy number variants are associated with cerebral and cognitive alterations in humans. Transl Psychiatry 2021; 11:182. [PMID: 33753722 PMCID: PMC7985307 DOI: 10.1038/s41398-021-01213-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 12/23/2020] [Accepted: 01/08/2021] [Indexed: 01/07/2023] Open
Abstract
Low-frequency 1q21.1 distal deletion and duplication copy number variant (CNV) carriers are predisposed to multiple neurodevelopmental disorders, including schizophrenia, autism and intellectual disability. Human carriers display a high prevalence of micro- and macrocephaly in deletion and duplication carriers, respectively. The underlying brain structural diversity remains largely unknown. We systematically called CNVs in 38 cohorts from the large-scale ENIGMA-CNV collaboration and the UK Biobank and identified 28 1q21.1 distal deletion and 22 duplication carriers and 37,088 non-carriers (48% male) derived from 15 distinct magnetic resonance imaging scanner sites. With standardized methods, we compared subcortical and cortical brain measures (all) and cognitive performance (UK Biobank only) between carrier groups also testing for mediation of brain structure on cognition. We identified positive dosage effects of copy number on intracranial volume (ICV) and total cortical surface area, with the largest effects in frontal and cingulate cortices, and negative dosage effects on caudate and hippocampal volumes. The carriers displayed distinct cognitive deficit profiles in cognitive tasks from the UK Biobank with intermediate decreases in duplication carriers and somewhat larger in deletion carriers-the latter potentially mediated by ICV or cortical surface area. These results shed light on pathobiological mechanisms of neurodevelopmental disorders, by demonstrating gene dose effect on specific brain structures and effect on cognitive function.
Collapse
Affiliation(s)
- Ida E Sønderby
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway.
- KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo, Oslo, Norway.
| | - Dennis van der Meer
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
| | - Clara Moreau
- Sainte Justine Hospital Research Center, Montreal, Quebec, Canada
- Centre de recherche de l'Institut universitaire de gériatrie de Montréal, Montreal, Quebec, Canada
| | - Tobias Kaufmann
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Psychiatry and Psychotherapy, University of Tübingen, Tübingen, Germany
| | - G Bragi Walters
- deCODE Genetics (Amgen), Reykjavík, Iceland
- Faculty of Medicine, University of Iceland, Reykjavík, Iceland
| | - Maria Ellegaard
- Department of Clinical Biochemistry, Copenhagen University Hospital, Rigshospitalet, Glostrup, Denmark
| | - Abdel Abdellaoui
- Department of Psychiatry, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
- Department of Biological Psychology and Netherlands Twin Register, VU University Amsterdam, Amsterdam, the Netherlands
| | - David Ames
- University of Melbourne Academic Unit for Psychiatry of Old Age, Kew, Australia
- National Ageing Research Institute, Parkville, Australia
| | - Katrin Amunts
- Institute of Neuroscience and Medicine, INM-1, Research Centre Jülich, Jülich, Germany
- C. and O. Vogt Institute for Brain Research, Medical Faculty, University Hospital Düsseldorf, Heinrich Heine University Duesseldorf, Düsseldorf, Germany
| | - Micael Andersson
- Umeå Centre for Functional Brain Imaging, Umeå University, Umeå, Sweden
- Department of Integrative Medical Biology, Umeå University, Umeå, Sweden
| | | | - Manon Bernard
- Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Nicholas B Blackburn
- South Texas Diabetes and Obesity Institute, Department of Human Genetics, School of Medicine, University of Texas Rio Grande Valley, Brownsville, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute, Department of Human Genetics, School of Medicine, University of Texas Rio Grande Valley, Brownsville, USA
| | - Dorret I Boomsma
- Department of Biological Psychology and Netherlands Twin Register, VU University Amsterdam, Amsterdam, the Netherlands
- Amsterdam Neuroscience, Amsterdam, the Netherlands
- Amsterdam Public Health Research Institute, VU Medical Center, Amsterdam, the Netherlands
| | - Henry Brodaty
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
- Dementia Centre for Research Collaboration, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Rachel M Brouwer
- Department of Psychiatry, University Medical Center Brain Center, Utrecht University, Utrecht, the Netherlands
| | - Robin Bülow
- Institute of Diagnostic Radiology and Neuroradiology, University Medicine Greifswald, Greifswald, Germany
| | - Rune Bøen
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - Wiepke Cahn
- Department of Psychiatry, University Medical Center Brain Center, Utrecht University, Utrecht, the Netherlands
- Altrecht Science, Utrecht, the Netherlands
| | - Vince D Calhoun
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, USA
- The Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, USA
| | - Svenja Caspers
- Institute of Neuroscience and Medicine, INM-1, Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Christopher R K Ching
- Imaging Genetics Center, Mark and Mary Stevens Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, USA
| | - Sven Cichon
- Institute of Neuroscience and Medicine, INM-1, Research Centre Jülich, Jülich, Germany
- Department of Biomedicine, University of Basel, Basel, Switzerland
- Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
| | - Simone Ciufolini
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Benedicto Crespo-Facorro
- University Hospital Marqués de Valdecilla, IDIVAL, Centro de Investigación Biomédica en Red Salud Mental (CIBERSAM), Santander, Spain
- University Hospital Virgen del Rocío, IBiS, Centre de Investigació Biomédica en Red Salud Mental (CIBERSAM), Sevilla, Spain
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute, Department of Human Genetics, School of Medicine, University of Texas Rio Grande Valley, Brownsville, USA
| | - Anders M Dale
- Center for Multimodal Imaging and Genetics, University of California, San Diego, USA
| | - Shareefa Dalvie
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, Western Cape, South Africa
| | - Paola Dazzan
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Eco J C de Geus
- Department of Biological Psychology and Netherlands Twin Register, VU University Amsterdam, Amsterdam, the Netherlands
- Amsterdam Neuroscience, Amsterdam, the Netherlands
- Amsterdam Public Health Research Institute, VU Medical Center, Amsterdam, the Netherlands
| | | | - Sonja M C de Zwarte
- Department of Psychiatry, University Medical Center Brain Center, Utrecht University, Utrecht, the Netherlands
| | - Sylvane Desrivieres
- Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Joanne L Doherty
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, United Kingdom
- Cardiff University Brain Research Imaging Centre School of Psychology, Cardiff University, Cardiff, United Kingdom
| | - Gary Donohoe
- Centre for Neuroimaging and Cognitive Genomics, School of Psychology and Discipline of Biochemistry, National University of Ireland Galway, Galway, Ireland
| | - Bogdan Draganski
- Laboratory for Research in Neuroimaging LREN, Centre for Research in Neurosciences, Department of Clinical Neurosciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
- Neurology Department, Max-Planck-Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Stefan Ehrlich
- Division of Psychological and Social Medicine, Faculty of Medicine, TU Dresden, Dresden, Germany
| | - Else Eising
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Thomas Espeseth
- Department of Psychology, University of Oslo, Oslo, Norway
- Bjørknes College, Oslo, Norway
| | - Kim Fejgin
- Signal Transduction, H. Lundbeck A/S, Ottiliavej 9, DK-2500, Valby, Denmark
| | - Simon E Fisher
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Tormod Fladby
- Department of Neurology, Akershus University Hospital, 1474, Nordbyhagen, Norway
- Institute of Clinical Medicine, Campus Ahus, University of Oslo, Oslo, Norway
| | - Oleksandr Frei
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Vincent Frouin
- Université Paris-Saclay, CEA, Neurospin, 91191, Gif-sur-Yvette, France
| | - Masaki Fukunaga
- Division of Cerebral Integration, National Institute for Physiological Sciences, Okazaki, Japan
- Department of Life Science, Sokendai, Hayama, Japan
| | - Thomas Gareau
- Université Paris-Saclay, CEA, Neurospin, 91191, Gif-sur-Yvette, France
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David C Glahn
- Boston Children's Hospital, Boston, Massachusetts, USA
- Institute of Living, Hartford, Connecticut, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Hans J Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
- German Center of Neurodegenerative Diseases (DZNE), Rostock/Greifswald, Greifswald, Germany
| | - Nynke A Groenewold
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, Western Cape, South Africa
| | | | - Jan Haavik
- Department of Biomedicine, University of Bergen, Bergen, Norway
- Division of Psychiatry, Haukeland University Hospital, Bergen, Norway
| | - Asta K Haberg
- Department of Neuromedicine and Movement Science, Norwegian University of Science and Technology, Trondheim, Norway
- St Olav's Hospital, Department of Radiology and Nuclear Medicine, Trondheim, Norway
| | - Jeremy Hall
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, United Kingdom
- School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Ryota Hashimoto
- Department of Pathology of Mental Diseases, National Institute of Mental Health, National Center of Neurology and Psychiatry, Kodaira, Japan
- Osaka University, Osaka, Japan
| | - Jayne Y Hehir-Kwa
- Princess Màxima Center for Pediatric Oncology, Utrecht, the Netherlands
| | | | - Manon H J Hillegers
- Department of Child and Adolescent Psychiatry/Psychology, Erasmus MC-Sophia, Rotterdam, the Netherlands
| | - Per Hoffmann
- Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
- Institute of Human Genetics, University of Bonn Medical School, Bonn, Germany
| | - Laurena Holleran
- Centre for Neuroimaging and Cognitive Genomics, School of Psychology and Discipline of Biochemistry, National University of Ireland Galway, Galway, Ireland
| | - Avram J Holmes
- Psychology Department, Yale University, New Haven, CT, USA
- Department of Psychiatry, Yale University, New Haven, CT, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Georg Homuth
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Jouke-Jan Hottenga
- Department of Biological Psychology and Netherlands Twin Register, VU University Amsterdam, Amsterdam, the Netherlands
- Amsterdam Neuroscience, Amsterdam, the Netherlands
- Amsterdam Public Health Research Institute, VU Medical Center, Amsterdam, the Netherlands
| | - Hilleke E Hulshoff Pol
- Department of Psychiatry, University Medical Center Brain Center, Utrecht University, Utrecht, the Netherlands
| | - Masashi Ikeda
- Department of Psychiatry, Fujita Health University School of Medicine, Toyoake, Japan
| | - Neda Jahanshad
- Imaging Genetics Center, Mark and Mary Stevens Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, USA
| | - Christiane Jockwitz
- Institute of Neuroscience and Medicine, INM-1, Research Centre Jülich, Jülich, Germany
- Institute for Anatomy I, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Stefan Johansson
- Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Erik G Jönsson
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Stockholm Region, Stockholm, Sweden
- Norwegian Centre for Mental Disorders Research (NORMENT), Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Niklas R Jørgensen
- Department of Clinical Biochemistry, Copenhagen University Hospital Rigshospitalet, Glostrup, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Masataka Kikuchi
- Department of Genome Informatics, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Emma E M Knowles
- Boston Children's Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Kuldeep Kumar
- Sainte Justine Hospital Research Center, Montreal, Quebec, Canada
| | - Stephanie Le Hellard
- Norwegian Centre for Mental Disorders Research, Department of Clinical Science, University of Bergen, Bergen, Norway
- Dr Einar Martens Research Group for Biological Psychiatry, Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Costin Leu
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London, WC1N 3BG, UK
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States
- Chalfont Centre for Epilepsy, Chalfont-St-Peter, United Kingdom
| | - David E J Linden
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, United Kingdom
| | - Jingyu Liu
- Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, USA
| | - Arvid Lundervold
- Department of Biomedicine, University of Bergen, Bergen, Norway
- Mohn Medical Imaging and Visualization Centre, Department of Radiology, Haukeland University Hospital, Bergen, Norway
| | | | - Anne M Maillard
- Service des Troubles du Spectre de l'Autisme et apparentés, Lausanne University Hospital, Lausanne, Switzerland
| | - Nicholas G Martin
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Sandra Martin-Brevet
- Laboratory for Research in Neuroimaging LREN, Centre for Research in Neurosciences, Department of Clinical Neurosciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Karen A Mather
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
- Neuroscience Research Australia, Randwick, Australia
| | - Samuel R Mathias
- Boston Children's Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Katie L McMahon
- Herston Imaging Research Facility and School of Clinical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Allan F McRae
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, Australia
| | - Sarah E Medland
- Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Andreas Meyer-Lindenberg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Torgeir Moberget
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Claudia Modenato
- Laboratory for Research in Neuroimaging LREN, Centre for Research in Neurosciences, Department of Clinical Neurosciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
- University of Lausanne, Lausanne, Switzerland
| | - Jennifer Monereo Sánchez
- Department of Radiology and Nuclear Medicine, Maastricht University Medical Center, Maastricht, the Netherlands
- School for Mental Health and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Derek W Morris
- Centre for Neuroimaging and Cognitive Genomics, School of Psychology and Discipline of Biochemistry, National University of Ireland Galway, Galway, Ireland
| | - Thomas W Mühleisen
- Institute of Neuroscience and Medicine, INM-1, Research Centre Jülich, Jülich, Germany
- C. and O. Vogt Institute for Brain Research, Medical Faculty, University Hospital Düsseldorf, Heinrich Heine University Duesseldorf, Düsseldorf, Germany
- Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Robin M Murray
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Jacob Nielsen
- Signal Transduction, H. Lundbeck A/S, Ottiliavej 9, DK-2500, Valby, Denmark
| | | | - Lars Nyberg
- Umeå Centre for Functional Brain Imaging, Umeå University, Umeå, Sweden
- Department of Integrative Medical Biology, Umeå University, Umeå, Sweden
- Department of Radiation Sciences, Umeå University, Umeå, Sweden
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, University of California, Los Angeles, USA
| | - Roel A Ophoff
- Center for Neurobehavioral Genetics, University of California, Los Angeles, USA
- Department of Psychiatry, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Michael J Owen
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, United Kingdom
| | - Tomas Paus
- Bloorview Research Institute, Holland Bloorview Kids Rehabilitation Hospital, Toronto, Ontario, Canada
- Physiology and Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Zdenka Pausova
- Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada
- Physiology and Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Juan M Peralta
- South Texas Diabetes and Obesity Institute, Department of Human Genetics, School of Medicine, University of Texas Rio Grande Valley, Brownsville, USA
| | - G Bruce Pike
- Departments of Radiology and Clinical Neurosciences, University of Calgary, Calgary, Alberta, Canada
| | - Carlos Prieto
- Bioinformatics Service, Nucleus, University of Salamanca, Salamanca, Spain
| | - Erin B Quinlan
- Centre for Population Neuroscience and Precision Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Céline S Reinbold
- Department of Biomedicine, University of Basel, Basel, Switzerland
- Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Tiago Reis Marques
- Department of Psychosis, Institute of Psychiatry, Psychology & Neuroscience, Kings College, London, United Kingdom
- Psychiatric Imaging Group, MRC London Institute of Medical Sciences (LMS), Hammersmith Hospital, Imperial College, London, United Kingdom
| | - James J H Rucker
- Institute of Psychiatry, Psychology and Neuroscience, London, London, United Kingdom
| | - Perminder S Sachdev
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
- Neuropsychiatric Institute, The Prince of Wales Hospital, Sydney, Australia
| | - Sigrid B Sando
- Department of Neuromedicine and Movement Science, Norwegian University of Science and Technology, Trondheim, Norway
- University Hospital of Trondheim,Department of Neurology and Clinical Neurophysiology, Trondheim, Norway
| | - Peter R Schofield
- Neuroscience Research Australia, Sydney, Australia
- School of Medical Sciences, University of New South Wales, Sydney, Australia
| | - Andrew J Schork
- Institute of Biological Psychiatry, Roskilde, Denmark
- The Translational Genetics Institute (TGEN), Phoenix, AZ, United States
| | - Gunter Schumann
- Centre for Population Neuroscience and Precision Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | - Jean Shin
- Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada
- Physiology and Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Elena Shumskaya
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Ana I Silva
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, the Netherlands
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, United Kingdom
- Cardiff University Brain Research Imaging Centre School of Psychology, Cardiff University, Cardiff, United Kingdom
| | - Sanjay M Sisodiya
- Department of Clinical and Experimental Epilepsy, UCL Queen Square Institute of Neurology, London, WC1N 3BG, UK
- Chalfont Centre for Epilepsy, Chalfont-St-Peter, United Kingdom
| | - Vidar M Steen
- Norwegian Centre for Mental Disorders Research, Department of Clinical Science, University of Bergen, Bergen, Norway
- Dr Einar Martens Research Group for Biological Psychiatry, Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Dan J Stein
- South African Medical Research Council Unit on Risk and Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Lachlan T Strike
- Queensland Brain Institute, University of Queensland, Brisbane, Australia
| | - Ikuo K Suzuki
- VIB Center for Brain & Disease Research, Stem Cell and Developmental Neurobiology Lab, Leuven, Belgium
- University of Brussels (ULB), Institute of Interdisciplinary Research (IRIBHM) ULB Neuroscience Institute, Brussels, Belgium
- The University of Tokyo, Department of Biological Sciences, Graduate School of Science, Tokyo, Japan
| | - Christian K Tamnes
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
- Department of Psychiatry, Diakonhjemmet Hospital, Oslo, Norway
| | - Alexander Teumer
- Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Anbupalam Thalamuthu
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Diana Tordesillas-Gutiérrez
- University Hospital Marqués de Valdecilla, IDIVAL, Centro de Investigación Biomédica en Red Salud Mental (CIBERSAM), Santander, Spain
- Department of Radiology, Marqués de Valdecilla University Hospital, Valdecilla Biomedical Research Institute IDIVAL, Santander, Spain
| | - Anne Uhlmann
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, Western Cape, South Africa
| | - Magnus O Ulfarsson
- deCODE Genetics (Amgen), Reykjavík, Iceland
- Faculty of Electrical and Computer Engineering, University of Iceland, Reykjavík, Iceland
| | - Dennis van 't Ent
- Department of Biological Psychology and Netherlands Twin Register, VU University Amsterdam, Amsterdam, the Netherlands
- Amsterdam Neuroscience, Amsterdam, the Netherlands
| | - Marianne B M van den Bree
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, United Kingdom
- School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Pierre Vanderhaeghen
- VIB-KU Leuven Center for Brain & Disease Research, 3000, Leuven, Belgium
- KU Leuven, Department of Neurosciences & Leuven Brain Institute, 3000, Leuven, Belgium
- Université Libre de Bruxelles (U.L.B.), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070, Brussels, Belgium
| | - Evangelos Vassos
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
- National Institute for Health Research, Mental Health Biomedical Research Centre, South London and Maudsley National Health Service Foundation Trust and King's College London, London, United Kingdom
| | - Wei Wen
- Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, Australia
| | - Katharina Wittfeld
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
- German Center of Neurodegenerative Diseases (DZNE), Rostock/Greifswald, Greifswald, Germany
| | - Margaret J Wright
- Queensland Brain Institute, University of Queensland, Brisbane, Australia
- Centre for Advanced Imaging, University of Queensland, Brisbane, Australia
| | - Ingrid Agartz
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, & Stockholm Health Care Services, Stockholm Region, Stockholm, Sweden
- Norwegian Centre for Mental Disorders Research (NORMENT), Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- Department of Psychiatry, Diakonhjemmet Hospital, Oslo, Norway
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
- Norwegian Centre for Mental Disorders Research, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Lars T Westlye
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
- KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | | | - Kari Stefansson
- deCODE Genetics (Amgen), Reykjavík, Iceland
- Faculty of Medicine, University of Iceland, Reykjavík, Iceland
| | - Sébastien Jacquemont
- Sainte Justine Hospital Research Center, Montreal, Quebec, Canada
- Department of Pediatrics, University of Montreal, Montreal, Quebec, Canada
| | - Paul M Thompson
- Imaging Genetics Center, Mark and Mary Stevens Institute for Neuroimaging and Informatics, University of Southern California, Los Angeles, USA
| | - Ole A Andreassen
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
25
|
Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, Montooth KL, Meiklejohn CD, Larracuente AM, Emerson JJ. Evolution of genome structure in the Drosophila simulans species complex. Genome Res 2021; 31:380-396. [PMID: 33563718 PMCID: PMC7919458 DOI: 10.1101/gr.263442.120] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 12/28/2020] [Indexed: 12/25/2022]
Abstract
The rapid evolution of repetitive DNA sequences, including satellite DNA, tandem duplications, and transposable elements, underlies phenotypic evolution and contributes to hybrid incompatibilities between species. However, repetitive genomic regions are fragmented and misassembled in most contemporary genome assemblies. We generated highly contiguous de novo reference genomes for the Drosophila simulans species complex (D. simulans, D. mauritiana, and D. sechellia), which speciated ∼250,000 yr ago. Our assemblies are comparable in contiguity and accuracy to the current D. melanogaster genome, allowing us to directly compare repetitive sequences between these four species. We find that at least 15% of the D. simulans complex species genomes fail to align uniquely to D. melanogaster owing to structural divergence-twice the number of single-nucleotide substitutions. We also find rapid turnover of satellite DNA and extensive structural divergence in heterochromatic regions, whereas the euchromatic gene content is mostly conserved. Despite the overall preservation of gene synteny, euchromatin in each species has been shaped by clade- and species-specific inversions, transposable elements, expansions and contractions of satellite and tRNA tandem arrays, and gene duplications. We also find rapid divergence among Y-linked genes, including copy number variation and recent gene duplications from autosomes. Our assemblies provide a valuable resource for studying genome evolution and its consequences for phenotypic evolution in these genetic model species.
Collapse
Affiliation(s)
- Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | - Ching-Ho Chang
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Danielle E Khost
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
- FAS Informatics and Scientific Applications, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Jeffrey Vedanayagam
- Department of Developmental Biology, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA
| | - Jeffrey R Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | - Kristi L Montooth
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska 68502, USA
| | - Colin D Meiklejohn
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska 68502, USA
| | | | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| |
Collapse
|
26
|
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes (Basel) 2020; 11:E1444. [PMID: 33266238 PMCID: PMC7760597 DOI: 10.3390/genes11121444] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/24/2020] [Accepted: 11/26/2020] [Indexed: 01/23/2023] Open
Abstract
Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.
Collapse
Affiliation(s)
- Nazeefa Fatima
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Anna Petri
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Ulf Gyllensten
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
27
|
Lee YG, Lee JY, Kim J, Kim YJ. Insertion variants missing in the human reference genome are widespread among human populations. BMC Biol 2020; 18:167. [PMID: 33187521 PMCID: PMC7666470 DOI: 10.1186/s12915-020-00894-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 10/09/2020] [Indexed: 01/07/2023] Open
Abstract
Background Structural variants comprise diverse genomic arrangements including deletions, insertions, inversions, and translocations, which can generally be detected in humans through sequence comparison to the reference genome. Among structural variants, insertions are the least frequently identified variants, mainly due to ascertainment bias in the reference genome, lack of previous sequence knowledge, and low complexity of typical insertion sequences. Though recent developments in long-read sequencing deliver promise in annotating individual non-reference insertions, population-level catalogues on non-reference insertion variants have not been identified and the possible functional roles of these hidden variants remain elusive. Results To detect non-reference insertion variants, we developed a pipeline, InserTag, which generates non-reference contigs by local de novo assembly and then infers the full-sequence of insertion variants by tracing contigs from non-human primates and other human genome assemblies. Application of the pipeline to data from 2535 individuals of the 1000 Genomes Project helped identify 1696 non-reference insertion variants and re-classify the variants as retention of ancestral sequences or novel sequence insertions based on the ancestral state. Genotyping of the variants showed that individuals had, on average, 0.92-Mbp sequences missing from the reference genome, 92% of the variants were common (allele frequency > 5%) among human populations, and more than half of the variants were major alleles. Among human populations, African populations were the most divergent and had the most non-reference sequences, which was attributed to the greater prevalence of high-frequency insertion variants. The subsets of insertion variants were in high linkage disequilibrium with phenotype-associated SNPs and showed signals of recent continent-specific selection. Conclusions Non-reference insertion variants represent an important type of genetic variation in the human population, and our developed pipeline, InserTag, provides the frameworks for the detection and genotyping of non-reference sequences missing from human populations. Supplementary information Supplementary information accompanies this paper at 10.1186/s12915-020-00894-1.
Collapse
Affiliation(s)
- Young-Gun Lee
- Department of Integrated Omics for Biomedical Science, WCU Graduate School, Yonsei University, Seoul, Republic of Korea
| | - Jin-Young Lee
- Department of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul, Republic of Korea
| | - Junhyong Kim
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Young-Joon Kim
- Department of Integrated Omics for Biomedical Science, WCU Graduate School, Yonsei University, Seoul, Republic of Korea. .,Department of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
28
|
Rodriguez OL, Gibson WS, Parks T, Emery M, Powell J, Strahl M, Deikus G, Auckland K, Eichler EE, Marasco WA, Sebra R, Sharp AJ, Smith ML, Bashir A, Watson CT. A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus. Front Immunol 2020; 11:2136. [PMID: 33072076 PMCID: PMC7539625 DOI: 10.3389/fimmu.2020.02136] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 08/06/2020] [Indexed: 02/06/2023] Open
Abstract
An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.
Collapse
Affiliation(s)
- Oscar L Rodriguez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - William S Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Matthew Emery
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - James Powell
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Maya Strahl
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Kathryn Auckland
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, United States
| | - Wayne A Marasco
- Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Icahn Institute of Data Science and Genomic Technology, New York, NY, United States
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Melissa L Smith
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States.,Icahn Institute of Data Science and Genomic Technology, New York, NY, United States
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| |
Collapse
|
29
|
Lodewijk GA, Fernandes DP, Vretzakis I, Savage JE, Jacobs FMJ. Evolution of Human Brain Size-Associated NOTCH2NL Genes Proceeds toward Reduced Protein Levels. Mol Biol Evol 2020; 37:2531-2548. [PMID: 32330268 PMCID: PMC7475042 DOI: 10.1093/molbev/msaa104] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Ever since the availability of genomes from Neanderthals, Denisovans, and ancient humans, the field of evolutionary genomics has been searching for protein-coding variants that may hold clues to how our species evolved over the last ∼600,000 years. In this study, we identify such variants in the human-specific NOTCH2NL gene family, which were recently identified as possible contributors to the evolutionary expansion of the human brain. We find evidence for the existence of unique protein-coding NOTCH2NL variants in Neanderthals and Denisovans which could affect their ability to activate Notch signaling. Furthermore, in the Neanderthal and Denisovan genomes, we find unusual NOTCH2NL configurations, not found in any of the modern human genomes analyzed. Finally, genetic analysis of archaic and modern humans reveals ongoing adaptive evolution of modern human NOTCH2NL genes, identifying three structural variants acting complementary to drive our genome to produce a lower dosage of NOTCH2NL protein. Because copy-number variations of the 1q21.1 locus, encompassing NOTCH2NL genes, are associated with severe neurological disorders, this seemingly contradicting drive toward low levels of NOTCH2NL protein indicates that the optimal dosage of NOTCH2NL may have not yet been settled in the human population.
Collapse
Affiliation(s)
- Gerrald A Lodewijk
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Diana P Fernandes
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Iraklis Vretzakis
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Jeanne E Savage
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Amsterdam Neuroscience, Complex Trait Genetics
| | - Frank M J Jacobs
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Neuroscience, Complex Trait Genetics
| |
Collapse
|
30
|
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, Schneider VA, Potapova T, Wood J, Chow W, Armstrong J, Fredrickson J, Pak E, Tigyi K, Kremitzki M, Markovic C, Maduro V, Dutra A, Bouffard GG, Chang AM, Hansen NF, Wilfert AB, Thibaud-Nissen F, Schmitt AD, Belton JM, Selvaraj S, Dennis MY, Soto DC, Sahasrabudhe R, Kaya G, Quick J, Loman NJ, Holmes N, Loose M, Surti U, Risques RA, Graves Lindsay TA, Fulton R, Hall I, Paten B, Howe K, Timp W, Young A, Mullikin JC, Pevzner PA, Gerton JL, Sullivan BA, Eichler EE, Phillippy AM. Telomere-to-telomere assembly of a complete human X chromosome. Nature 2020; 585:79-84. [PMID: 32663838 PMCID: PMC7484160 DOI: 10.1038/s41586-020-2547-7] [Citation(s) in RCA: 396] [Impact Index Per Article: 99.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 05/29/2020] [Indexed: 12/15/2022]
Abstract
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, San Diego, CA, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Edmund Howe
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | | | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- Cytogenetic and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | | | - Valerie Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amalia Dutra
- Cytogenetic and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Alexander M Chang
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amy B Wilfert
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Megan Y Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Daniela C Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Ruta Sahasrabudhe
- DNA Technologies Core, Genome Center, University of California Davis, Davis, CA, USA
| | - Gulhan Kaya
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Josh Quick
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Nicholas J Loman
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Nadine Holmes
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Matthew Loose
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Rosa Ana Risques
- Department of Pathology, University of Washington, Seattle, WA, USA
| | | | - Robert Fulton
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | - Ira Hall
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Winston Timp
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - James C Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | | | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Division of Human Genetics, Duke University Medical Center, Durham, NC, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
31
|
Soylev A, Le TM, Amini H, Alkan C, Hormozdiari F. Discovery of tandem and interspersed segmental duplications using high-throughput sequencing. Bioinformatics 2020; 35:3923-3930. [PMID: 30937433 DOI: 10.1093/bioinformatics/btz237] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 01/20/2019] [Accepted: 03/29/2019] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. RESULTS We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). AVAILABILITY AND IMPLEMENTATION TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Arda Soylev
- Department of Computer Engineering, Bilkent University, Ankara.,Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Thong Minh Le
- UC-Davis Genome Center, University of California, Davis, CA, USA.,Department of Computer Science, University of California, Davis, CA, USA
| | - Hajar Amini
- Department of Neurology, School of Medicine, University of California, Davis, CA, USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara.,Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey.,Department of Computer Science, ETH Zürich, Zurich, Switzerland
| | - Fereydoun Hormozdiari
- UC-Davis Genome Center, University of California, Davis, CA, USA.,Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, USA.,MIND Institute, University of California, Davis, CA, USA
| |
Collapse
|
32
|
Dohm JC, Peters P, Stralis-Pavese N, Himmelbauer H. Benchmarking of long-read correction methods. NAR Genom Bioinform 2020; 2:lqaa037. [PMID: 33575591 PMCID: PMC7671305 DOI: 10.1093/nargab/lqaa037] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 05/02/2020] [Accepted: 05/15/2020] [Indexed: 01/25/2023] Open
Abstract
Third-generation sequencing technologies provided by Pacific Biosciences and Oxford Nanopore Technologies generate read lengths in the scale of kilobasepairs. However, these reads display high error rates, and correction steps are necessary to realize their great potential in genomics and transcriptomics. Here, we compare properties of PacBio and Nanopore data and assess correction methods by Canu, MARVEL and proovread in various combinations. We found total error rates of around 13% in the raw datasets. PacBio reads showed a high rate of insertions (around 8%) whereas Nanopore reads showed similar rates for substitutions, insertions and deletions of around 4% each. In data from both technologies the errors were uniformly distributed along reads apart from noisy 5' ends, and homopolymers appeared among the most over-represented kmers relative to a reference. Consensus correction using read overlaps reduced error rates to about 1% when using Canu or MARVEL after patching. The lowest error rate in Nanopore data (0.45%) was achieved by applying proovread on MARVEL-patched data including Illumina short-reads, and the lowest error rate in PacBio data (0.42%) was the result of Canu correction with minimap2 alignment after patching. Our study provides valuable insights and benchmarks regarding long-read data and correction methods.
Collapse
Affiliation(s)
- Juliane C Dohm
- Institute of Computational Biology, Department of Biotechnology, University of Life Sciences and Natural Resources, Vienna (BOKU), Muthgasse 18, 1190 Vienna, Austria
| | - Philipp Peters
- Institute of Computational Biology, Department of Biotechnology, University of Life Sciences and Natural Resources, Vienna (BOKU), Muthgasse 18, 1190 Vienna, Austria
| | - Nancy Stralis-Pavese
- Institute of Computational Biology, Department of Biotechnology, University of Life Sciences and Natural Resources, Vienna (BOKU), Muthgasse 18, 1190 Vienna, Austria
| | - Heinz Himmelbauer
- Institute of Computational Biology, Department of Biotechnology, University of Life Sciences and Natural Resources, Vienna (BOKU), Muthgasse 18, 1190 Vienna, Austria
| |
Collapse
|
33
|
Kuhnle A, Mun T, Boucher C, Gagie T, Langmead B, Manzini G. Efficient Construction of a Complete Index for Pan-Genomics Read Alignment. J Comput Biol 2020; 27:500-513. [PMID: 32181684 PMCID: PMC7185338 DOI: 10.1089/cmb.2019.0309] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Short-read aligners predominantly use the FM-index, which is easily able to index one or a few human genomes. However, it does not scale well to indexing collections of thousands of genomes. Driving this issue are the two chief components of the index: (1) a rank data structure over the Burrows–Wheeler Transform (BWT) of the string that will allow us to find the interval in the string's suffix array (SA), and (2) a sample of the SA that—when used with the rank data structure—allows us to access the SA. The rank data structure can be kept small even for large genomic databases, by run-length compressing the BWT, but until recently there was no means known to keep the SA sample small without greatly slowing down access to the SA. Now that (SODA 2018) has defined an SA sample that takes about the same space as the run-length compressed BWT, we have the design for efficient FM-indexes of genomic databases but are faced with the problem of building them. In 2018, we showed how to build the BWT of large genomic databases efficiently (WABI 2018), but the problem of building the sample efficiently was left open. We compare our approach to state-of-the-art methods for constructing the SA sample, and demonstrate that it is the fastest and most space-efficient method on highly repetitive genomic databases. Lastly, we apply our method for indexing partial and whole human genomes and show that it improves over the FM-index-based Bowtie method with respect to both memory and time and over the hybrid index-based CHIC method with respect to query time and memory required for indexing.
Collapse
Affiliation(s)
- Alan Kuhnle
- Department of Computer Science, Florida State University, Tallahassee, Florida
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida
| | - Taher Mun
- Department of Computer Science, John Hopkins University, Baltimore, Maryland
- Address correspondence to: Taher Mun, PhD Candidate, Department of Computer Science, John Hopkins University, 3400 North Charles Street, Baltimore, MD 21218-2682
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida
| | - Travis Gagie
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- School of Computer Science and Telecommunications, Universidad Diego Portales and CeBiB, Santiago, Chile
| | - Ben Langmead
- Department of Computer Science, John Hopkins University, Baltimore, Maryland
| | - Giovanni Manzini
- Department of Science and Technological Innovation, University of Eastern Piedmont, Alessandria, Italy
| |
Collapse
|
34
|
Loss of p57 KIP2 expression confers resistance to contact inhibition in human androgenetic trophoblast stem cells. Proc Natl Acad Sci U S A 2019; 116:26606-26613. [PMID: 31792181 PMCID: PMC6936680 DOI: 10.1073/pnas.1916019116] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Complete hydatidiform moles (CHMs) develop from androgenetic conceptuses and are characterized by enhanced proliferation of trophoblast cells and a significantly higher risk of trophoblast tumors. Loss of the maternal genome and duplication of the paternal genome are considered to be responsible for the phenotype, but the detailed mechanism remains unclear. Here, we report the derivation of trophoblast stem (TS) cells from CHMs. These cells have reduced sensitivity to contact inhibition of cell proliferation and exhibit aberrant expression of imprinted genes, which are expressed from only 1 parental allele. We also reveal that the maternally expressed imprinted gene p57KIP2 would be responsible for the enhanced proliferation of CHM-derived TS cells. Our findings provide an insight into the pathogenesis of CHMs. A complete hydatidiform mole (CHM) is androgenetic in origin and characterized by enhanced trophoblastic proliferation and the absence of fetal tissue. In 15 to 20% of cases, CHMs are followed by malignant gestational trophoblastic neoplasms including choriocarcinoma. Aberrant genomic imprinting may be responsible for trophoblast hypertrophy in CHMs, but the detailed mechanisms are still elusive, partly due to the lack of suitable animal or in vitro models. We recently developed a culture system of human trophoblast stem (TS) cells. In this study, we apply this system to CHMs for a better understanding of their molecular pathology. CHM-derived TS cells, designated as TSmole cells, are morphologically similar to biparental TS (TSbip) cells and express TS-specific markers such as GATA3, KRT7, and TFAP2C. Interestingly, TSmole cells have a growth advantage over TSbip cells only after they reach confluence. We found that p57KIP2, a maternally expressed gene encoding a cyclin-dependent kinase inhibitor, is strongly induced by increased cell density in TSbip cells, but not in TSmole cells. Knockout and overexpression studies suggest that loss of p57KIP2 expression would be the major cause of the reduced sensitivity to contact inhibition in CHMs. Our findings shed light on the molecular mechanism underlying the pathogenesis of CHMs and could have broad implications in tumorigenesis beyond CHMs because silencing of p57KIP2 is frequently observed in a variety of human tumors.
Collapse
|
35
|
Eggertsson HP, Kristmundsdottir S, Beyter D, Jonsson H, Skuladottir A, Hardarson MT, Gudbjartsson DF, Stefansson K, Halldorsson BV, Melsted P. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat Commun 2019; 10:5402. [PMID: 31776332 PMCID: PMC6881350 DOI: 10.1038/s41467-019-13341-9] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 10/30/2019] [Indexed: 12/31/2022] Open
Abstract
Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.
Collapse
Affiliation(s)
- Hannes P Eggertsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland.
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.
| | - Snaedis Kristmundsdottir
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
- School of Science and Engineering, Reykjavik University, Reykjavik, Iceland
| | - Doruk Beyter
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
| | - Hakon Jonsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
| | | | | | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland.
- School of Science and Engineering, Reykjavik University, Reykjavik, Iceland.
| | - Pall Melsted
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland.
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
36
|
Zhou A, Lin T, Xing J. Evaluating nanopore sequencing data processing pipelines for structural variation identification. Genome Biol 2019; 20:237. [PMID: 31727126 PMCID: PMC6857234 DOI: 10.1186/s13059-019-1858-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 10/10/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. RESULTS Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers' performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. CONCLUSIONS We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.
Collapse
Affiliation(s)
- Anbo Zhou
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Timothy Lin
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetics Institute of New Jersey, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA.
| |
Collapse
|
37
|
Abstract
The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.
Collapse
Affiliation(s)
- Sara Ballouz
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Jesse A Gillis
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
38
|
Informatics for PacBio Long Reads. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1129:119-129. [PMID: 30968364 DOI: 10.1007/978-981-13-6037-4_8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
In this article, we review the development of a wide variety of bioinformatics software implementing state-of-the-art algorithms since the introduction of SMRT sequencing technology into the field. We focus on the three major categories of development: read mapping (aligning to reference genomes), de novo assembly, and detection of structural variants. The long SMRT reads benefit all the applications, but they are achievable only through considering the nature of the long reads technology properly.
Collapse
|
39
|
Firtina C, Bar-Joseph Z, Alkan C, Cicek AE. Hercules: a profile HMM-based hybrid error correction algorithm for long reads. Nucleic Acids Res 2019; 46:e125. [PMID: 30124947 PMCID: PMC6265270 DOI: 10.1093/nar/gky724] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 08/07/2018] [Indexed: 01/15/2023] Open
Abstract
Choosing whether to use second or third generation sequencing platforms can lead to trade-offs between accuracy and read length. Several types of studies require long and accurate reads. In such cases researchers often combine both technologies and the erroneous long reads are corrected using the short reads. Current approaches rely on various graph or alignment based techniques and do not take the error profile of the underlying technology into account. Efficient machine learning algorithms that address these shortcomings have the potential to achieve more accurate integration of these two technologies. We propose Hercules, the first machine learning-based long read error correction algorithm. Hercules models every long read as a profile Hidden Markov Model with respect to the underlying platform’s error profile. The algorithm learns a posterior transition/emission probability distribution for each long read to correct errors in these reads. We show on two DNA-seq BAC clones (CH17-157L1 and CH17-227A2) that Hercules-corrected reads have the highest mapping rate among all competing algorithms and have the highest accuracy when the breadth of coverage is high. On a large human CHM1 cell line WGS data set, Hercules is one of the few scalable algorithms; and among those, it achieves the highest accuracy.
Collapse
Affiliation(s)
- Can Firtina
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | - Ziv Bar-Joseph
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | - A Ercument Cicek
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey.,Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
40
|
Nagasaki M, Kuroki Y, Shibata TF, Katsuoka F, Mimori T, Kawai Y, Minegishi N, Hozawa A, Kuriyama S, Suzuki Y, Kawame H, Nagami F, Takai-Igarashi T, Ogishima S, Kojima K, Misawa K, Tanabe O, Fuse N, Tanaka H, Yaegashi N, Kinoshita K, Kure S, Yasuda J, Yamamoto M. Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing. Hum Genome Var 2019; 6:27. [PMID: 31231536 PMCID: PMC6555796 DOI: 10.1038/s41439-019-0057-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 01/28/2019] [Accepted: 03/15/2019] [Indexed: 12/14/2022] Open
Abstract
In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100 bps to ~10,000 bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website. Researchers in Japan have assembled a Japanese reference genome, which includes sequences missing from the international reference genome, as well as others specific to East Asian populations. A team led by Masao Nagasaki and Masayuki Yamamoto sequenced a Japanese individual using a method, which produces longer sequences than previous technologies. Using this approach, they identified thousands of sequences spanning 2.5 million bases, which were absent in the international reference genome. Many of these were sequences able to move within the genome. They showed that the majority of these sequences are also present in early humans and chimpanzees, demonstrating that their absence from the current reference is due to deletions or limitations of earlier sequencing methodologies. In addition to providing a population-specific reference, these findings demonstrate the importance of continually improving the international reference genome.
Collapse
Affiliation(s)
- Masao Nagasaki
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Yoko Kuroki
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,4Department of Genome Medicine, National Center for Child Health and Development, Tokyo, Japan
| | - Tomoko F Shibata
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Fumiki Katsuoka
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Takahiro Mimori
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Yosuke Kawai
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Naoko Minegishi
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Atsushi Hozawa
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Shinichi Kuriyama
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,5International Research Institute of Disaster Science, Tohoku University, Sendai, Japan
| | - Yoichi Suzuki
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Hiroshi Kawame
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Fuji Nagami
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | | | - Soichi Ogishima
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Kaname Kojima
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Kazuharu Misawa
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Osamu Tanabe
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Nobuo Fuse
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,6Tohoku University Hospital, Tohoku University, Sendai, Japan
| | - Hiroshi Tanaka
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Nobuo Yaegashi
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,6Tohoku University Hospital, Tohoku University, Sendai, Japan
| | - Kengo Kinoshita
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Shiego Kure
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,6Tohoku University Hospital, Tohoku University, Sendai, Japan
| | - Jun Yasuda
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Masayuki Yamamoto
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| |
Collapse
|
41
|
Yu H, Wang J, Sheng X, Zhao Z, Shen Y, Branca F, Gu H. Construction of a high-density genetic map and identification of loci controlling purple sepal trait of flower head in Brassica oleracea L. italica. BMC PLANT BIOLOGY 2019; 19:228. [PMID: 31146678 PMCID: PMC6543578 DOI: 10.1186/s12870-019-1831-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/14/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND Some broccoli (Brassica oleracea L. italic) accessions have purple sepals and cold weather would deepen the purple color, while the sepals of other broccoli lines are always green even in cold winter. The related locus or gene is still unknown. In this study, a high-density genetic map was constructed based on specific locus amplified fragment (SLAF) sequencing in a doubled-haploid segregation population with 127 individuals. And mapping of the purple sepal trait in flower heads based on phenotypic data collected during three seasons was performed. RESULTS A genetic map was constructed, which contained 6694 SLAF markers with an average sequencing depth of 81.37-fold in the maternal line, 84-fold in the paternal line, and 15.76-fold in each individual population studied. In all of the annual data recorded, three quantitative trait loci (QTLs) were identified that were all distributed within the linkage group (LG) 1. Among them, a major locus, qPH.C01-2, located at 36.393 cM LG1, was consistently detected in all analysis. Besides this locus, another two minor loci, qPH.C01-4 and qPH.C01-5, were identified near qPH.C01-2, based on the phenotypic data from spring of 2018. CONCLUSION The purple sepal trait could be controlled by a major single locus and two minor loci. The genetic map and location of the purple sepal trait of flower heads provide an important foundation for mapping other compound traits and the identification of the genes related to purple sepal trait in broccoli.
Collapse
Affiliation(s)
- Huifang Yu
- Institute of Vegetable, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Jiansheng Wang
- Institute of Vegetable, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xiaoguang Sheng
- Institute of Vegetable, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Zhenqing Zhao
- Institute of Vegetable, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Yusen Shen
- Institute of Vegetable, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Ferdinando Branca
- Department of Agriculture, Food and Environment, University of Catania, 95123 Catania, Italy
| | - Honghui Gu
- Institute of Vegetable, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| |
Collapse
|
42
|
A comparative analysis of methods for de novo assembly of hymenopteran genomes using either haploid or diploid samples. Sci Rep 2019; 9:6480. [PMID: 31019201 PMCID: PMC6482151 DOI: 10.1038/s41598-019-42795-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 04/04/2019] [Indexed: 01/05/2023] Open
Abstract
Diverse invertebrate taxa including all 200,000 species of Hymenoptera (ants, bees, wasps and sawflies) have a haplodiploid sex determination system, where females are diploid and males are haploid. Thus, hymenopteran genome projects can make use of DNA from a single haploid male sample, which is assumed advantageous for genome assembly. For the purpose of gene annotation, transcriptome sequencing is usually conducted using RNA from a pool of individuals. We conducted a comparative analysis of genome and transcriptome assembly and annotation methods, using genetic sources of different ploidy: (1) DNA from a haploid male or a diploid female (2) RNA from the same haploid male or a pool of individuals. We predicted that the use of a haploid male as opposed to a diploid female will simplify the genome assembly and gene annotation thanks to the lack of heterozygosity. Using DNA and RNA from the same haploid individual is expected to provide better confidence in transcript-to-genome alignment, and improve the annotation of gene structure in terms of the exon/intron boundaries. The haploid genome assemblies proved to be more contiguous, with both contig and scaffold N50 size at least threefold greater than their diploid counterparts. Completeness evaluation showed mixed results. The SOAPdenovo2 diploid assembly was missing more genes than the haploid assembly. The SPAdes diploid assembly had more complete genes, but a higher level of duplicates, and a greatly overestimated genome size. When aligning the two transcriptomes against the male genome, the male transcriptome gave 2–3% more complete transcripts than the pool transcriptome for genes with comparable expression levels in both transcriptomes. However, this advantage disappears in the final results of the gene annotation pipeline that incorporates evidence from homologous proteins. The RNA pool is still required to obtain the full transcriptome with genes that are expressed in other life stages and castes. In conclusion, the use of a haploid source material for a de novo genome project provides a substantial advantage to the quality of the genome draft and the use of RNA from the same haploid individual for transcriptome to genome alignment provides a minor advantage for genes that are expressed in the adult male.
Collapse
|
43
|
Pacini CE, Bradshaw CR, Garrett NJ, Koziol MJ. Characteristics and homogeneity of N6-methylation in human genomes. Sci Rep 2019; 9:5185. [PMID: 30914725 PMCID: PMC6435722 DOI: 10.1038/s41598-019-41601-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 03/13/2019] [Indexed: 12/31/2022] Open
Abstract
A novel DNA modification, N-6 methylated deoxyadenosine (m6dA), has recently been discovered in eukaryotic genomes. Despite its low abundance in eukaryotes, m6dA is implicated in human diseases such as cancer. It is therefore important to precisely identify and characterize m6dA in the human genome. Here, we identify m6dA sites at nucleotide level, in different human cells, genome wide. We compare m6dA features between distinct human cells and identify m6dA characteristics in human genomes. Our data demonstrates for the first time that despite low m6dA abundance, the m6dA mark does often occur consistently at the same genomic location within a given human cell type, demonstrating m6dA homogeneity. We further show, for the first time, higher levels of m6dA homogeneity within one chromosome. Most m6dA are found on a single chromosome from a diploid sample, suggesting inheritance. Our transcriptome analysis not only indicates that human genes with m6dA are associated with higher RNA transcript levels but identifies allele-specific gene transcripts showing haplotype-specific m6dA methylation, which are implicated in different biological functions. Our analyses demonstrate the precision and consistency by which the m6dA mark occurs within the human genome, suggesting that m6dA marks are precisely inherited in humans.
Collapse
Affiliation(s)
- Clare E Pacini
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK
- Department of Zoology, University of Cambridge, Cambridge, CB3 3EJ, UK
| | - Charles R Bradshaw
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK
| | - Nigel J Garrett
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK
- Department of Zoology, University of Cambridge, Cambridge, CB3 3EJ, UK
| | - Magdalena J Koziol
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK.
- Department of Zoology, University of Cambridge, Cambridge, CB3 3EJ, UK.
| |
Collapse
|
44
|
Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, Warren WC, Magrini V, McGrath SD, Li YI, Wilson RK, Eichler EE. Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 2019; 176:663-675.e19. [PMID: 30661756 PMCID: PMC6438697 DOI: 10.1016/j.cell.2018.12.019] [Citation(s) in RCA: 278] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 09/01/2018] [Accepted: 12/12/2018] [Indexed: 12/17/2022]
Abstract
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
Collapse
Affiliation(s)
- Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tina A Graves-Lindsay
- McDonnell Genome Institute, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ankeeta Shah
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Wesley C Warren
- McDonnell Genome Institute, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Vincent Magrini
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH 43205, USA; The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | - Sean D McGrath
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH 43205, USA
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Richard K Wilson
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH 43205, USA; The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
45
|
|
46
|
Wang H, Chai Z, Hu D, Ji Q, Xin J, Zhang C, Zhong J. A global analysis of CNVs in diverse yak populations using whole-genome resequencing. BMC Genomics 2019; 20:61. [PMID: 30658572 PMCID: PMC6339343 DOI: 10.1186/s12864-019-5451-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 01/11/2019] [Indexed: 12/01/2022] Open
Abstract
Background Genomic structural variation represents a source for genetic and phenotypic variation, which may be subject to selection during the environmental adaptation and population differentiation. Here, we described a genome-wide analysis of copy number variations (CNVs) in 16 populations of yak based on genome resequencing data and CNV-based cluster analyses of these populations. Results In total, we identified 51,461 CNV events and defined 3174 copy number variation regions (CNVRs) that covered 163.8 Mb (6.2%) of yak genome with more “loss” events than both “gain” and “both” events, and we confirmed 31 CNVRs in 36 selected yaks using quantitative PCR. Of the total 163.8 Mb CNVR coverage, a 10.8 Mb region of high-confidence CNVRs directly overlapped with the 52.9 Mb of segmental duplications, and we confirmed their uneven distributions across chromosomes. Furthermore, functional annotation indicated that the CNVR-harbored genes have a considerable variety of molecular functions, including immune response, glucose metabolism, and sensory perception. Notably, some of the identified CNVR-harbored genes associated with adaptation to hypoxia (e.g., DCC, MRPS28, GSTCD, MOGAT2, DEXI, CIITA, and SMYD1). Additionally, cluster analysis, based on either individuals or populations, showed that the CNV clustering was divided into two origins, indicating that some yak CNVs are likely to arisen independently in different populations and contribute to population difference. Conclusions Collectively, the results of the present study advanced our understanding of CNV as an important type of genomic structural variation in yak, and provide a useful genomic resource to facilitate further research on yak evolution and breeding. Electronic supplementary material The online version of this article (10.1186/s12864-019-5451-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hui Wang
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, 610000, People's Republic of China
| | - Zhixin Chai
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, 610000, People's Republic of China
| | - Dan Hu
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, 610000, People's Republic of China
| | - Qiumei Ji
- State Key Laboratory of Barley and Yak Germplasm Resources and Genetic Improvement, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa, 850000, People's Republic of China
| | - Jinwei Xin
- State Key Laboratory of Barley and Yak Germplasm Resources and Genetic Improvement, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa, 850000, People's Republic of China
| | - Chengfu Zhang
- State Key Laboratory of Barley and Yak Germplasm Resources and Genetic Improvement, Tibet Academy of Agricultural and Animal Husbandry Sciences, Lhasa, 850000, People's Republic of China
| | - Jincheng Zhong
- Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization (Southwest Minzu University), Ministry of Education, Chengdu, 610000, People's Republic of China.
| |
Collapse
|
47
|
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 2019; 8:5256637. [PMID: 30576505 PMCID: PMC6324547 DOI: 10.1093/gigascience/giy157] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 11/27/2018] [Indexed: 02/05/2023] Open
Abstract
Background Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. Findings We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. Conclusions LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Collapse
Affiliation(s)
- Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China.,College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Tian-Jun Xu
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Rui Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China.,College of Fisheries and Life Science, Shanghai Ocean University, 999 Huchenghuan Road, Shanghai, 201306, China
| | - Yan Zhang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Shang-Qi Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Hong-Wei Wang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| |
Collapse
|
48
|
Meng J, Xu Y, Shen X, Liang C. A novel frameshift PKD1 mutation in a Chinese patient with autosomal dominant polycystic kidney disease and azoospermia: A case report. Exp Ther Med 2019; 17:507-511. [PMID: 30651829 DOI: 10.3892/etm.2018.6946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Accepted: 10/03/2018] [Indexed: 11/05/2022] Open
Abstract
Autosomal dominant polycystic kidney disease (ADPKD) is primarily caused by mutations in polycystin 1, transient receptor potential channel interacting (PKD1) and PKD2, and characterized by numerous cysts in various organs, primarily the kidneys and liver. The present case report is on a 33-year-old Chinese male patient who suffered from abdominal pain and hypertension, and presented with long-term infertility. Laboratory tests indicated that the patient had a normal renal function, while abdominal computed tomography demonstrated that the patient had enlarged kidneys with a volume of 1,127.21 cm3. In a semen analysis, no sperm was detected, while a subsequent testicular biopsy analysis demonstrated numerous mature sperms with progressive motility which suggests that the cysts of the epididymis and the dilated seminal vesicles may have obstructed the ejaculation of semen. Genetic testing identified that a novel missense mutation (c.9053delT) that was responsible for the disease. ADPKD has various disease severities, which depend on whether there is a PKD1 or PKD2 mutation and whether the mutation impairs the function of the polycystin protein. Therefore, genetic testing is important for the clinical diagnosis and prognosis of ADPKD patients, as well as prenatal diagnosis.
Collapse
Affiliation(s)
- Jialin Meng
- Department of Urology, The First Affiliated Hospital of Anhui Medical University and Institute of Urology, Anhui Medical University, Hefei, Anhui 230022, P.R. China
| | - Yuchen Xu
- Department of Urology, The First Affiliated Hospital of Anhui Medical University and Institute of Urology, Anhui Medical University, Hefei, Anhui 230022, P.R. China
| | - Xufeng Shen
- Department of Urology, The First Affiliated Hospital of Anhui Medical University and Institute of Urology, Anhui Medical University, Hefei, Anhui 230022, P.R. China
| | - Chaozhao Liang
- Department of Urology, The First Affiliated Hospital of Anhui Medical University and Institute of Urology, Anhui Medical University, Hefei, Anhui 230022, P.R. China
| |
Collapse
|
49
|
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 2019. [PMID: 30576505 DOI: 10.5524/100540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
BACKGROUND Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Collapse
Affiliation(s)
- Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Tian-Jun Xu
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Rui Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Fisheries and Life Science, Shanghai Ocean University, 999 Huchenghuan Road, Shanghai, 201306, China
| | - Yan Zhang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Shang-Qi Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Hong-Wei Wang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| |
Collapse
|
50
|
Xu GC, Xu TJ, Zhu R, Zhang Y, Li SQ, Wang HW, Li JT. LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly. Gigascience 2019. [PMID: 30576505 DOI: 10.1093/gigascience/giy157/5256637] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023] Open
Abstract
BACKGROUND Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes. FINDINGS We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome. CONCLUSIONS LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.
Collapse
Affiliation(s)
- Gui-Cai Xu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Tian-Jun Xu
- College of Marine Science, Zhejiang Ocean University, 1 Haida South Road, Zhoushan, 316022, China
| | - Rui Zhu
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
- College of Fisheries and Life Science, Shanghai Ocean University, 999 Huchenghuan Road, Shanghai, 201306, China
| | - Yan Zhang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Shang-Qi Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Hong-Wei Wang
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| | - Jiong-Tang Li
- Key Laboratory of Aquatic Genomics, Ministry of Agriculture and Rural Affairs, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Chinese Academy of Fishery Sciences, 150 Yongding Road, Beijing, 100141, China
| |
Collapse
|