1
|
Liang X, Duan Q, Li B, Wang Y, Bu Y, Zhang Y, Kuang Z, Mao L, An X, Wang H, Yang X, Wan N, Feng Z, Shen W, Miao W, Chen J, Liu S, Storz JF, Liu J, Nevo E, Li K. Genomic structural variation contributes to evolved changes in gene expression in high-altitude Tibetan sheep. Proc Natl Acad Sci U S A 2024; 121:e2322291121. [PMID: 38913905 PMCID: PMC11228492 DOI: 10.1073/pnas.2322291121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/06/2024] [Indexed: 06/26/2024] Open
Abstract
Tibetan sheep were introduced to the Qinghai Tibet plateau roughly 3,000 B.P., making this species a good model for investigating genetic mechanisms of high-altitude adaptation over a relatively short timescale. Here, we characterize genomic structural variants (SVs) that distinguish Tibetan sheep from closely related, low-altitude Hu sheep, and we examine associated changes in tissue-specific gene expression. We document differentiation between the two sheep breeds in frequencies of SVs associated with genes involved in cardiac function and circulation. In Tibetan sheep, we identified high-frequency SVs in a total of 462 genes, including EPAS1, PAPSS2, and PTPRD. Single-cell RNA-Seq data and luciferase reporter assays revealed that the SVs had cis-acting effects on the expression levels of these three genes in specific tissues and cell types. In Tibetan sheep, we identified a high-frequency chromosomal inversion that exhibited modified chromatin architectures relative to the noninverted allele that predominates in Hu sheep. The inversion harbors several genes with altered expression patterns related to heart protection, brown adipocyte proliferation, angiogenesis, and DNA repair. These findings indicate that SVs represent an important source of genetic variation in gene expression and may have contributed to high-altitude adaptation in Tibetan sheep.
Collapse
Affiliation(s)
- Xiaolong Liang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Qijiao Duan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Bowen Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Yinjia Wang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Yueting Bu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Yonglu Zhang
- Fengjia Town Health Center, Rushan City, Weihai City264200, China
| | - Zhuoran Kuang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Leyan Mao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Xuan An
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Huihua Wang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing100193, China
| | - Xiaojie Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Na Wan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Zhilong Feng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Wei Shen
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Weilan Miao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Jiaqi Chen
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Sanyuan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Jay F. Storz
- School of Biological Sciences, University of Nebraska, Lincoln, NE68588
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| | - Eviatar Nevo
- Institute of Evolution, University of Haifa, Haifa3498838, Israel
| | - Kexin Li
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou730000, China
| |
Collapse
|
2
|
Jeong H, Dishuck PC, Yoo D, Harvey WT, Munson KM, Lewis AP, Kordosky J, Garcia GH, Yilmaz F, Hallast P, Lee C, Pastinen T, Eichler EE. Structural polymorphism and diversity of human segmental duplications. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.04.597452. [PMID: 38895457 PMCID: PMC11185583 DOI: 10.1101/2024.06.04.597452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Segmental duplications (SDs) contribute significantly to human disease, evolution, and diversity yet have been difficult to resolve at the sequence level. We present a population genetics survey of SDs by analyzing 170 human genome assemblies where the majority of SDs are fully resolved using long-read sequence assembly. Excluding the acrocentric short arms, we identify 173.2 Mbp of duplicated sequence (47.4 Mbp not present in the telomere-to-telomere reference) distinguishing fixed from structurally polymorphic events. We find that intrachromosomal SDs are among the most variable with rare events mapping near their progenitor sequences. African genomes harbor significantly more intrachromosomal SDs and are more likely to have recently duplicated gene families with higher copy number when compared to non-African samples. A comparison to a resource of 563 million full-length Iso-Seq reads identifies 201 novel, potentially protein-coding genes corresponding to these copy number polymorphic SDs.
Collapse
Affiliation(s)
- Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Altos Labs, San Diego, CA, USA
| | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Gage H. Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital and University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024; 630:401-411. [PMID: 38811727 PMCID: PMC11168930 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
4
|
Laudanski K, Elmadhoun O, Mathew A, Kahn-Pascual Y, Kerfeld MJ, Chen J, Sisniega DC, Gomez F. Anesthetic Considerations for Patients with Hereditary Neuropathy with Liability to Pressure Palsies: A Narrative Review. Healthcare (Basel) 2024; 12:858. [PMID: 38667620 PMCID: PMC11050561 DOI: 10.3390/healthcare12080858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 03/28/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Hereditary neuropathy with liability to pressure palsies (HNPP) is an autosomal dominant demyelinating neuropathy characterized by an increased susceptibility to peripheral nerve injury from trauma, compression, or shear forces. Patients with this condition are unique, necessitating distinct considerations for anesthesia and surgical teams. This review describes the etiology, prevalence, clinical presentation, and management of HNPP and presents contemporary evidence and recommendations for optimal care for HNPP patients in the perioperative period. While the incidence of HNPP is reported at 7-16:100,000, this figure may be an underestimation due to underdiagnosis, further complicating medicolegal issues. With the subtle nature of symptoms associated with HNPP, patients with this condition may remain unrecognized during the perioperative period, posing significant risks. Several aspects of caring for this population, including anesthetic choices, intraoperative positioning, and monitoring strategy, may deviate from standard practices. As such, a tailored approach to caring for this unique population, coupled with meticulous preoperative planning, is crucial and requires a multidisciplinary approach.
Collapse
Affiliation(s)
- Krzysztof Laudanski
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - Omar Elmadhoun
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - Amal Mathew
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA;
| | - Yul Kahn-Pascual
- St George’s University Hospitals NHS Foundation Trust, London SW17 0QT, UK;
| | - Mitchell J. Kerfeld
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - James Chen
- Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN 55902, USA; (K.L.); (O.E.); (M.J.K.); (J.C.)
| | - Daniella C. Sisniega
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA;
| | - Francisco Gomez
- Department of Neurology, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
5
|
Zhang S, Xu N, Fu L, Yang X, Li Y, Yang Z, Feng Y, Ma K, Jiang X, Han J, Hu R, Zhang L, de Gennaro L, Ryabov F, Meng D, He Y, Wu D, Yang C, Paparella A, Mao Y, Bian X, Lu Y, Antonacci F, Ventura M, Shepelev VA, Miga KH, Alexandrov IA, Logsdon GA, Phillippy AM, Su B, Zhang G, Eichler EE, Lu Q, Shi Y, Sun Q, Mao Y. Comparative genomics of macaques and integrated insights into genetic variation and population history. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.07.588379. [PMID: 38645259 PMCID: PMC11030432 DOI: 10.1101/2024.04.07.588379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.
Collapse
|
6
|
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
Affiliation(s)
- Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Haley J. Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Carl A. Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Xian H. Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Robert M. Cook-Deegan
- Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
| | - Omar E. Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Daniel Doerr
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jana Ebler
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L. Felsenfeld
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nanibaa’ A. Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Erich D. Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E. Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A. Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Jan O. Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Julian K. Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hugo Magalhães
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Tobias Marschall
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | | | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E. Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice B. Popejoy
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ashley D. Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I. Schultz
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Jonas A. Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael W. Smith
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J. Sofia
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N. Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
7
|
Bukhman YV, Morin PA, Meyer S, Chu LF, Jacobsen JK, Antosiewicz-Bourget J, Mamott D, Gonzales M, Argus C, Bolin J, Berres ME, Fedrigo O, Steill J, Swanson SA, Jiang P, Rhie A, Formenti G, Phillippy AM, Harris RS, Wood JMD, Howe K, Kirilenko BM, Munegowda C, Hiller M, Jain A, Kihara D, Johnston JS, Ionkov A, Raja K, Toh H, Lang A, Wolf M, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography. Mol Biol Evol 2024; 41:msae036. [PMID: 38376487 PMCID: PMC10919930 DOI: 10.1093/molbev/msae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 01/11/2024] [Accepted: 01/22/2024] [Indexed: 02/21/2024] Open
Abstract
The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.
Collapse
Affiliation(s)
- Yury V Bukhman
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Phillip A Morin
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
| | - Susanne Meyer
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Li-Fang Chu
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Comparative Biology and Experimental Medicine, University of Calgary, Calgary, Canada
| | | | | | - Daniel Mamott
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Maylie Gonzales
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cara Argus
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Jennifer Bolin
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mark E Berres
- University of Wisconsin Biotechnology Center, Bioinformatics Resource Center, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
| | - John Steill
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Scott A Swanson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Peng Jiang
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH, USA
- Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH, USA
- Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | | | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Bogdan M Kirilenko
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Alexander Ionkov
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Kalpana Raja
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Huishi Toh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Aimee Lang
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
| | - Magnus Wolf
- Institute for Evolution and Biodiversity (IEB), University of Muenster, 48149, Muenster, Germany
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
| | - James A Thomson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA
- Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53726, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
| | - Ron Stewart
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| |
Collapse
|
8
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
9
|
Yang Y, Wu Z, Wu Z, Li T, Shen Z, Zhou X, Wu X, Li G, Zhang Y. A near-complete assembly of asparagus bean provides insights into anthocyanin accumulation in pods. PLANT BIOTECHNOLOGY JOURNAL 2023; 21:2473-2489. [PMID: 37558431 PMCID: PMC10651155 DOI: 10.1111/pbi.14142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 07/11/2023] [Accepted: 07/23/2023] [Indexed: 08/11/2023]
Abstract
Asparagus bean (Vigna unguiculata ssp. sesquipedialis), a subspecies of V. unguiculata, is a vital legume crop widely cultivated in Asia for its tender pods consumed as vegetables. However, the existing asparagus bean assemblies still contain numerous gaps and unanchored sequences, which presents challenges to functional genomics research. Here, we present an improved reference genome sequence of an elite asparagus bean variety, Fengchan 6, achieved through the integration of nanopore ultra-long reads, PacBio high-fidelity reads, and Hi-C technology. The improved assembly is 521.3 Mb in length and demonstrates several enhancements, including a higher N50 length (46.4 Mb), an anchor ratio of 99.8%, and the presence of only one gap. Furthermore, we successfully assembled 14 telomeres and all 11 centromeres, including four telomere-to-telomere chromosomes. Remarkably, the centromeric regions cover a total length of 38.1 Mb, providing valuable insights into the complex architecture of centromeres. Among the 30 594 predicted protein-coding genes, we identified 2356 genes that are tandemly duplicated in segmental duplication regions. These findings have implications for defence responses and may contribute to evolutionary processes. By utilizing the reference genome, we were able to effectively identify the presence of the gene VuMYB114, which regulates the accumulation of anthocyanins, thereby controlling the purple coloration of the pods. This discovery holds significant implications for understanding the underlying mechanisms of color determination and the breeding process. Overall, the highly improved reference genome serves as crucial resource and lays a solid foundation for asparagus bean genomic studies and genetic improvement efforts.
Collapse
Affiliation(s)
- Yi Yang
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic CenterSun Yat‐Sen UniversityGuangzhouChina
| | - Zengxiang Wu
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Tinyao Li
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Zhuo Shen
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Xuan Zhou
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Xinyi Wu
- Institute of VegetableZhejiang Academy of Agricultural SciencesHangzhouChina
| | - Guojing Li
- Institute of VegetableZhejiang Academy of Agricultural SciencesHangzhouChina
| | - Yan Zhang
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| |
Collapse
|
10
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
11
|
Bredemeyer KR, Hillier L, Harris AJ, Hughes GM, Foley NM, Lawless C, Carroll RA, Storer JM, Batzer MA, Rice ES, Davis BW, Raudsepp T, O'Brien SJ, Lyons LA, Warren WC, Murphy WJ. Single-haplotype comparative genomics provides insights into lineage-specific structural variation during cat evolution. Nat Genet 2023; 55:1953-1963. [PMID: 37919451 PMCID: PMC10845050 DOI: 10.1038/s41588-023-01548-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/20/2023] [Indexed: 11/04/2023]
Abstract
The role of structurally dynamic genomic regions in speciation is poorly understood due to challenges inherent in diploid genome assembly. Here we reconstructed the evolutionary dynamics of structural variation in five cat species by phasing the genomes of three interspecies F1 hybrids to generate near-gapless single-haplotype assemblies. We discerned that cat genomes have a paucity of segmental duplications relative to great apes, explaining their remarkable karyotypic stability. X chromosomes were hotspots of structural variation, including enrichment with inversions in a large recombination desert with characteristics of a supergene. The X-linked macrosatellite DXZ4 evolves more rapidly than 99.5% of the genome clarifying its role in felid hybrid incompatibility. Resolved sensory gene repertoires revealed functional copy number changes associated with ecomorphological adaptations, sociality and domestication. This study highlights the value of gapless genomes to reveal structural mechanisms underpinning karyotypic evolution, reproductive isolation and ecological niche adaptation.
Collapse
Affiliation(s)
- Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Andrew J Harris
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Graham M Hughes
- School of Biology & Environmental Sciences, University College Dublin, Dublin, Ireland
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Colleen Lawless
- School of Biology & Environmental Sciences, University College Dublin, Dublin, Ireland
| | - Rachel A Carroll
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA
| | | | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Edward S Rice
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Brian W Davis
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Terje Raudsepp
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Stephen J O'Brien
- Guy Harvey Oceanographic Center, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Leslie A Lyons
- Department of Veterinary Medicine & Surgery, University of Missouri, Columbia, MO, USA
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA.
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
12
|
Sun M, Yao C, Shu Q, He Y, Chen G, Yang G, Xu S, Liu Y, Xue Z, Wu J. Telomere-to-telomere pear ( Pyrus pyrifolia) reference genome reveals segmental and whole genome duplication driving genome evolution. HORTICULTURE RESEARCH 2023; 10:uhad201. [PMID: 38023478 PMCID: PMC10681005 DOI: 10.1093/hr/uhad201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 10/01/2023] [Indexed: 12/01/2023]
Abstract
Previously released pear genomes contain a plethora of gaps and unanchored genetic regions. Here, we report a telomere-to-telomere (T2T) gap-free genome for the red-skinned pear, 'Yunhong No. 1' (YH1; Pyrus pyrifolia), which is mainly cultivated in Yunnan Province (southwest China), the pear's primary region of origin. The YH1 genome is 501.20 Mb long with a contig N50 length of 29.26 Mb. All 17 chromosomes were assembled to the T2T level with 34 characterized telomeres. The 17 centromeres were predicted and mainly consist of centromeric-specific monomers (CEN198) and long terminal repeat (LTR) Gypsy elements (≥74.73%). By filling all unclosed gaps, the integrity of YH1 is markedly improved over previous P. pyrifolia genomes ('Cuiguan' and 'Nijisseiki'). A total of 1531 segmental duplication (SD) driven duplicated genes were identified and enriched in stress response pathways. Intrachromosomal SDs drove the expansion of disease resistance genes, suggesting the potential of SDs in adaptive pear evolution. A large proportion of duplicated gene pairs exhibit dosage effects or sub-/neo-functionalization, which may affect agronomic traits like stone cell content, sugar content, and fruit skin russet. Furthermore, as core regulators of anthocyanin biosynthesis, we found that MYB10 and MYB114 underwent various gene duplication events. Multiple copies of MYB10 and MYB114 displayed obvious dosage effects, indicating role differentiation in the formation of red-skinned pear fruit. In summary, the T2T gap-free pear genome provides invaluable resources for genome evolution and functional genomics.
Collapse
Affiliation(s)
- Manyi Sun
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Chenjie Yao
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Qun Shu
- Institute of Horticulture, Yunnan Academy of Agricultural Sciences, Kunming 650205, China
| | - Yingyun He
- Institute of Horticulture, Yunnan Academy of Agricultural Sciences, Kunming 650205, China
| | - Guosong Chen
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Guangyan Yang
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Shaozhuo Xu
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Yueyuan Liu
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Zhaolong Xue
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| | - Jun Wu
- College of Horticulture, State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Zhongshan Biological Breeding Laboratory, No.50 Zhongling Street, Nanjing, Jiangsu 210014, China
| |
Collapse
|
13
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
14
|
Liu Q, Xie B, Gao Y, Xu S, Lu Y. A protocol for applying low-coverage whole-genome sequencing data in structural variation studies. STAR Protoc 2023; 4:102433. [PMID: 37432854 PMCID: PMC10362160 DOI: 10.1016/j.xpro.2023.102433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 05/23/2023] [Accepted: 06/12/2023] [Indexed: 07/13/2023] Open
Abstract
Structural variations (SVs) have a great impact on various biological processes and influence physical traits in many species. Here, we present a protocol for applying the low-coverage next-generation sequencing data of Rhipicephalus microplus to detect high-differentiated SVs accurately. We also outline its use to investigate population/species-specific genetic structures, local adaptation, and transcriptional function. We describe steps for constructing variation maps and SV annotation. We then detail population genetic analysis and differential gene expression analysis. For complete details on the usage and execution of this protocol, please refer to Liu et al. (2023).
Collapse
Affiliation(s)
- Qi Liu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China
| | - Bo Xie
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yang Gao
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China.
| | - Yan Lu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai 201203, China.
| |
Collapse
|
15
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 74] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
16
|
Meadows JRS, Kidd JM, Wang GD, Parker HG, Schall PZ, Bianchi M, Christmas MJ, Bougiouri K, Buckley RM, Hitte C, Nguyen AK, Wang C, Jagannathan V, Niskanen JE, Frantz LAF, Arumilli M, Hundi S, Lindblad-Toh K, Ginja C, Agustina KK, André C, Boyko AR, Davis BW, Drögemüller M, Feng XY, Gkagkavouzis K, Iliopoulos G, Harris AC, Hytönen MK, Kalthoff DC, Liu YH, Lymberakis P, Poulakakis N, Pires AE, Racimo F, Ramos-Almodovar F, Savolainen P, Venetsani S, Tammen I, Triantafyllidis A, vonHoldt B, Wayne RK, Larson G, Nicholas FW, Lohi H, Leeb T, Zhang YP, Ostrander EA. Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture. Genome Biol 2023; 24:187. [PMID: 37582787 PMCID: PMC10426128 DOI: 10.1186/s13059-023-03023-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 07/25/2023] [Indexed: 08/17/2023] Open
Abstract
BACKGROUND The international Dog10K project aims to sequence and analyze several thousand canine genomes. Incorporating 20 × data from 1987 individuals, including 1611 dogs (321 breeds), 309 village dogs, 63 wolves, and four coyotes, we identify genomic variation across the canid family, setting the stage for detailed studies of domestication, behavior, morphology, disease susceptibility, and genome architecture and function. RESULTS We report the analysis of > 48 M single-nucleotide, indel, and structural variants spanning the autosomes, X chromosome, and mitochondria. We discover more than 75% of variation for 239 sampled breeds. Allele sharing analysis indicates that 94.9% of breeds form monophyletic clusters and 25 major clades. German Shepherd Dogs and related breeds show the highest allele sharing with independent breeds from multiple clades. On average, each breed dog differs from the UU_Cfam_GSD_1.0 reference at 26,960 deletions and 14,034 insertions greater than 50 bp, with wolves having 14% more variants. Discovered variants include retrogene insertions from 926 parent genes. To aid functional prioritization, single-nucleotide variants were annotated with SnpEff and Zoonomia phyloP constraint scores. Constrained positions were negatively correlated with allele frequency. Finally, the utility of the Dog10K data as an imputation reference panel is assessed, generating high-confidence calls across varied genotyping platform densities including for breeds not included in the Dog10K collection. CONCLUSIONS We have developed a dense dataset of 1987 sequenced canids that reveals patterns of allele sharing, identifies likely functional variants, informs breed structure, and enables accurate imputation. Dog10K data are publicly available.
Collapse
Affiliation(s)
- Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden.
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA.
| | - Guo-Dong Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Heidi G Parker
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA
| | - Peter Z Schall
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA
| | - Matteo Bianchi
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
| | - Matthew J Christmas
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
| | - Katia Bougiouri
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| | - Reuben M Buckley
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA
| | - Christophe Hitte
- University of Rennes, CNRS, Institute Genetics and Development Rennes - UMR6290, 35000, Rennes, France
| | - Anthony K Nguyen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA
| | - Chao Wang
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
| | - Vidhya Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - Julia E Niskanen
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Laurent A F Frantz
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E14NS, UK and Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, D-80539, Munich, Germany
| | - Meharji Arumilli
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Sruthi Hundi
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Catarina Ginja
- BIOPOLIS-CIBIO-InBIO-Centro de Investigação Em Biodiversidade E Recursos Genéticos - ArchGen Group, Universidade Do Porto, 4485-661, Vairão, Portugal
| | | | - Catherine André
- University of Rennes, CNRS, Institute Genetics and Development Rennes - UMR6290, 35000, Rennes, France
| | - Adam R Boyko
- Department of Biomedical Sciences, Cornell University, 930 Campus Road, Ithaca, NY, 14853, USA
| | - Brian W Davis
- Department of Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Michaela Drögemüller
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - Xin-Yao Feng
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Konstantinos Gkagkavouzis
- Department of Genetics, School of Biology, ), Aristotle University of Thessaloniki, Thessaloniki, Macedonia 54124, Greece and Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation (CIRI-AUTH, Balkan Center, Thessaloniki, Greece
| | - Giorgos Iliopoulos
- NGO "Callisto", Wildlife and Nature Conservation Society, 54621, Thessaloniki, Greece
| | - Alexander C Harris
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA
| | - Marjo K Hytönen
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Daniela C Kalthoff
- NGO "Callisto", Wildlife and Nature Conservation Society, 54621, Thessaloniki, Greece
| | - Yan-Hu Liu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Petros Lymberakis
- Natural History Museum of Crete & Department of Biology, University of Crete, 71202, Irakleio, Greece
- Biology Department, School of Sciences and Engineering, University of Crete, Heraklion, Greece
- Palaeogenomics and Evolutionary Genetics Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
| | - Nikolaos Poulakakis
- Natural History Museum of Crete & Department of Biology, University of Crete, 71202, Irakleio, Greece
- Biology Department, School of Sciences and Engineering, University of Crete, Heraklion, Greece
- Palaeogenomics and Evolutionary Genetics Lab, Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
| | - Ana Elisabete Pires
- BIOPOLIS-CIBIO-InBIO-Centro de Investigação Em Biodiversidade E Recursos Genéticos - ArchGen Group, Universidade Do Porto, 4485-661, Vairão, Portugal
| | - Fernando Racimo
- Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark
| | | | - Peter Savolainen
- Department of Gene Technology, Science for Life Laboratory, KTH - Royal Institute of Technology, 17121, Solna, Sweden
| | - Semina Venetsani
- Department of Genetics, School of Biology, Aristotle University of Thessaloniki, 54124, Thessaloniki, Macedonia, Greece
| | - Imke Tammen
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2570, Australia
| | - Alexandros Triantafyllidis
- Department of Genetics, School of Biology, ), Aristotle University of Thessaloniki, Thessaloniki, Macedonia 54124, Greece and Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation (CIRI-AUTH, Balkan Center, Thessaloniki, Greece
| | - Bridgett vonHoldt
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Robert K Wayne
- Department of Ecology and Evolutionary Biology, Ecology and Evolutionary Biology, University of California, Los Angeles, CA, 90095-7246, USA
| | - Greger Larson
- Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, OX1 3TG, UK
| | - Frank W Nicholas
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2570, Australia
| | - Hannes Lohi
- Department of Medical and Clinical Genetics, Department of Veterinary Biosciences, University of Helsinki and Folkhälsan Research Center, 02900, Helsinki, Finland
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - Ya-Ping Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50 Room 5351, Bethesda, MD, 20892, USA.
| |
Collapse
|
17
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 235] [Impact Index Per Article: 235.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
18
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
19
|
Payne ZL, Penny GM, Turner TN, Dutcher SK. A gap-free genome assembly of Chlamydomonas reinhardtii and detection of translocations induced by CRISPR-mediated mutagenesis. PLANT COMMUNICATIONS 2023; 4:100493. [PMID: 36397679 PMCID: PMC10030371 DOI: 10.1016/j.xplc.2022.100493] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 10/26/2022] [Accepted: 11/15/2022] [Indexed: 05/04/2023]
Abstract
Genomic assemblies of the unicellular green alga Chlamydomonas reinhardtii have provided important resources for researchers. However, assembly errors, large gaps, and unplaced scaffolds as well as strain-specific variants currently impede many types of analysis. By combining PacBio HiFi and Oxford Nanopore long-read technologies, we generated a de novo genome assembly for strain CC-5816, derived from crosses of strains CC-125 and CC-124. Multiple methods of evaluating genome completeness and base-pair error rate suggest that the final telomere-to-telomere assembly is highly accurate. The CC-5816 assembly enabled previously difficult analyses that include characterization of the 17 centromeres, rDNA arrays on three chromosomes, and 56 insertions of organellar DNA into the nuclear genome. Using Nanopore sequencing, we identified sites of cytosine (CpG) methylation, which are enriched at centromeres. We analyzed CRISPR-Cas9 insertional mutants in the PF23 gene. Two of the three alleles produced progeny that displayed patterns of meiotic inviability that suggested the presence of a chromosomal aberration. Mapping Nanopore reads from pf23-2 and pf23-3 onto the CC-5816 genome showed that these two strains each carry a translocation that was initiated at the PF23 gene locus on chromosome 11 and joined with chromosomes 5 or 3, respectively. The translocations were verified by demonstrating linkage between loci on the two translocated chromosomes in meiotic progeny. The three pf23 alleles display the expected short-cilia phenotype, and immunoblotting showed that pf23-2 lacks the PF23 protein. Our CC-5816 genome assembly will undoubtedly provide an important tool for the Chlamydomonas research community.
Collapse
Affiliation(s)
- Zachary L Payne
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Gervette M Penny
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Tychele N Turner
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Susan K Dutcher
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA.
| |
Collapse
|
20
|
Harringmeyer OS, Hoekstra HE. Chromosomal inversion polymorphisms shape the genomic landscape of deer mice. Nat Ecol Evol 2022; 6:1965-1979. [PMID: 36253543 PMCID: PMC9715431 DOI: 10.1038/s41559-022-01890-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 08/17/2022] [Indexed: 12/15/2022]
Abstract
Chromosomal inversions are an important form of structural variation that can affect recombination, chromosome structure and fitness. However, because inversions can be challenging to detect, the prevalence and hence the significance of inversions segregating within species remains largely unknown, especially in natural populations of mammals. Here, by combining population-genomic and long-read sequencing analyses in a single, widespread species of deer mouse (Peromyscus maniculatus), we identified 21 polymorphic inversions that are large (1.5-43.8 Mb) and cause near-complete suppression of recombination when heterozygous (0-0.03 cM Mb-1). We found that inversion breakpoints frequently occur in centromeric and telomeric regions and are often flanked by long inverted repeats (0.5-50 kb), suggesting that they probably arose via ectopic recombination. By genotyping inversions in populations across the species' range, we found that the inversions are often widespread and do not harbour deleterious mutational loads, and many are likely to be maintained as polymorphisms by divergent selection. Comparisons of forest and prairie ecotypes of deer mice revealed 13 inversions that contribute to differentiation between populations, of which five exhibit significant associations with traits implicated in local adaptation. Taken together, these results show that inversion polymorphisms have a significant impact on recombination, genome structure and genetic diversity in deer mice and likely facilitate local adaptation across the widespread range of this species.
Collapse
Affiliation(s)
- Olivia S Harringmeyer
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Museum of Comparative Zoology and Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA.
| | - Hopi E Hoekstra
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Museum of Comparative Zoology and Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
21
|
Toh H, Yang C, Formenti G, Raja K, Yan L, Tracey A, Chow W, Howe K, Bergeron LA, Zhang G, Haase B, Mountcastle J, Fedrigo O, Fogg J, Kirilenko B, Munegowda C, Hiller M, Jain A, Kihara D, Rhie A, Phillippy AM, Swanson SA, Jiang P, Clegg DO, Jarvis ED, Thomson JA, Stewart R, Chaisson MJP, Bukhman YV. A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes. BMC Biol 2022; 20:245. [DOI: 10.1186/s12915-022-01427-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/29/2022] [Indexed: 11/09/2022] Open
Abstract
Abstract
Background
The Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely used Mus musculus and Rattus norvegicus models, holds the promise of better translation of research findings to the clinic.
Results
We report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including some that affect genes associated with type 2 diabetes and metabolic dysfunctions. We discuss 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse.
Conclusions
Our findings reflect the exceptional level of genomic resolution present in this assembly, which will greatly expand the potential of the Nile rat as a model organism.
Collapse
|
22
|
Vollger MR, Guitart X, Dishuck PC, Mercuri L, Harvey WT, Gershman A, Diekhans M, Sulovari A, Munson KM, Lewis AP, Hoekzema K, Porubsky D, Li R, Nurk S, Koren S, Miga KH, Phillippy AM, Timp W, Ventura M, Eichler EE. Segmental duplications and their variation in a complete human genome. Science 2022; 376:eabj6965. [PMID: 35357917 PMCID: PMC8979283 DOI: 10.1126/science.abj6965] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.
Collapse
Affiliation(s)
- Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ludovica Mercuri
- Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
23
|
Aganezov S, Yan SM, Soto DC, Kirsche M, Zarate S, Avdeyev P, Taylor DJ, Shafin K, Shumate A, Xiao C, Wagner J, McDaniel J, Olson ND, Sauria MEG, Vollger MR, Rhie A, Meredith M, Martin S, Lee J, Koren S, Rosenfeld JA, Paten B, Layer R, Chin CS, Sedlazeck FJ, Hansen NF, Miller DE, Phillippy AM, Miga KH, McCoy RC, Dennis MY, Zook JM, Schatz MC. A complete reference genome improves analysis of human genetic variation. Science 2022; 376:eabl3533. [PMID: 35357935 PMCID: PMC9336181 DOI: 10.1126/science.abl3533] [Citation(s) in RCA: 123] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C. Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel Avdeyev
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Dylan J. Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Justin Wagner
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D. Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Melissa Meredith
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Skylar Martin
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | - Sergey Koren
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ryan Layer
- Department of Computer Science and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
| | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, National Human Genome Research Institute, Rockville, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital, Seattle, WA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, CA, USA
| | - Justin M. Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
24
|
Išerić H, Alkan C, Hach F, Numanagić I. Fast characterization of segmental duplication structure in multiple genome assemblies. Algorithms Mol Biol 2022; 17:4. [PMID: 35303886 PMCID: PMC8932185 DOI: 10.1186/s13015-022-00210-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/08/2022] [Indexed: 11/29/2022] Open
Abstract
MOTIVATION The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. RESULTS Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7-33[Formula: see text] speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. AVAILABILITY AND IMPLEMENTATION BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser .
Collapse
Affiliation(s)
- Hamza Išerić
- Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, 06800, Ankara, Turkey
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC, V6H 3Z6, Canada
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, V5Z 1M9, Canada
| | - Ibrahim Numanagić
- Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada.
| |
Collapse
|
25
|
Lou H, Gao Y, Xie B, Wang Y, Zhang H, Shi M, Ma S, Zhang X, Liu C, Xu S. Haplotype-resolved de novo assembly of a Tujia genome suggests the necessity for high-quality population-specific genome references. Cell Syst 2022; 13:321-333.e6. [PMID: 35180379 DOI: 10.1016/j.cels.2022.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 11/09/2021] [Accepted: 01/27/2022] [Indexed: 12/17/2022]
Abstract
Even though the human reference genome assembly is continually being improved, it remains debatable whether a population-specific reference is necessary for every ethnic group. Here, we de novo assembled an individual genome (TJ1) from the Tujia population, an ethnic minority group most closely related to the Han Chinese. TJ1 provided a high-quality haplotype-resolved assembly of chromosome-scale with a scaffold N50 size >78 Mb. Compared with GRCh38 and other de novo assemblies, TJ1 improved short-read mapping, enhanced calling precision for structural variants, and detected rare and low-frequency variants. This revealed fine-scale differences between the closely related Han and Tujia populations, such as population-stratified variants of LCT and UBXN8, and improved screening for ancestry informative markers. We demonstrated that TJ1 could reduce false positives in clinical diagnosis and analyzed the PRSS1-PRSS2 locus as a test case. Our results suggest that population-specific assemblies are necessary for genetic and medical analysis, especially when closely related populations are studied. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Haiyi Lou
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China.
| | - Yang Gao
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Bo Xie
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yimin Wang
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | - Miao Shi
- Berry Genomics, Beijing 102200, China
| | - Sen Ma
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaoxi Zhang
- School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Chang Liu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai 200032, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou 221116, China; Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou 450052, China; Ministry of Education Key Laboratory of Contemporary Anthropology, Human Phenome Institute, Fudan University, Shanghai 201203, China.
| |
Collapse
|
26
|
Neubert K, Zuchantke E, Leidenfrost RM, Wünschiers R, Grützke J, Malorny B, Brendebach H, Al Dahouk S, Homeier T, Hotzel H, Reinert K, Tomaso H, Busch A. Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures. BMC Genomics 2021; 22:822. [PMID: 34773979 PMCID: PMC8590783 DOI: 10.1186/s12864-021-08115-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 10/12/2021] [Indexed: 02/08/2023] Open
Abstract
Background We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods. Results We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach. Conclusions Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08115-x.
Collapse
Affiliation(s)
- Kerstin Neubert
- Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany.,German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Eric Zuchantke
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Robert Maximilian Leidenfrost
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17a, 09648, Mittweida, Germany
| | - Röbbe Wünschiers
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17a, 09648, Mittweida, Germany
| | - Josephine Grützke
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Burkhard Malorny
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Holger Brendebach
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Sascha Al Dahouk
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Timo Homeier
- Friedrich-Loeffler-Institut, Institute of Epidemiology, Südufer, 10 17493, Greifswald, Insel Riems, Germany
| | - Helmut Hotzel
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany
| | - Herbert Tomaso
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Anne Busch
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany. .,Department of Anaesthesiology and Intensive Care Medicine, University Hospital Jena, Jena, Germany.
| |
Collapse
|
27
|
Raj Kolora SR, Owens GL, Vazquez JM, Stubbs A, Chatla K, Jainese C, Seeto K, McCrea M, Sandel MW, Vianna JA, Maslenikov K, Bachtrog D, Orr JW, Love M, Sudmant PH. Origins and evolution of extreme life span in Pacific Ocean rockfishes. Science 2021; 374:842-847. [PMID: 34762458 PMCID: PMC8923369 DOI: 10.1126/science.abg5332] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Pacific Ocean rockfishes (genus Sebastes) exhibit extreme variation in life span, with some species being among the most long-lived extant vertebrates. We de novo assembled the genomes of 88 rockfish species and from these identified repeated signatures of positive selection in DNA repair pathways in long-lived taxa and 137 longevity-associated genes with direct effects on life span through insulin signaling and with pleiotropic effects through size and environmental adaptations. A genome-wide screen of structural variation reveals copy number expansions in the immune modulatory butyrophilin gene family in long-lived species. The evolution of different rockfish life histories is coupled to genetic diversity and reshapes the mutational spectrum driving segregating CpG→TpG variants in long-lived species. These analyses highlight the genetic innovations that underlie life history trait adaptations and, in turn, how they shape genomic diversity.
Collapse
Affiliation(s)
| | - Gregory L. Owens
- University of California Berkeley Department of Integrative Biology
- University of Victoria Department of Biology
| | | | - Alexander Stubbs
- University of California Berkeley Department of Integrative Biology
| | - Kamalakar Chatla
- University of California Berkeley Department of Integrative Biology
| | - Conner Jainese
- University of California Santa Barbara Marine Sciences Institute
| | - Katelin Seeto
- University of California Santa Barbara Marine Sciences Institute
| | - Merit McCrea
- University of California Santa Barbara Marine Sciences Institute
| | | | - Juliana A. Vianna
- Pontificia Universidad Católica de Chile, Departamento de Ecosistemas y Medio Ambiente
| | - Katherine Maslenikov
- University of Washington, School of Aquatic and Fishery Sciences and Burke Museum of Natural History and Culture
| | - Doris Bachtrog
- University of California Berkeley Department of Integrative Biology
| | - James W. Orr
- University of Washington, School of Aquatic and Fishery Sciences and Burke Museum of Natural History and Culture
| | - Milton Love
- University of California Santa Barbara Marine Sciences Institute
| | - Peter H. Sudmant
- University of California Berkeley Department of Integrative Biology
- University of California Berkeley Center for Computational Biology
| |
Collapse
|
28
|
Li K, Jiang W, Hui Y, Kong M, Feng LY, Gao LZ, Li P, Lu S. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. MOLECULAR PLANT 2021; 14:1745-1756. [PMID: 34171481 DOI: 10.1016/j.molp.2021.06.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/18/2021] [Accepted: 06/22/2021] [Indexed: 05/04/2023]
Abstract
The ultimate goal of genome assembly is a high-accuracy gapless genome. Here, we report a new assembly pipeline that is used to produce a gapless genome for the indica rice cultivar Minghui 63. The resulting 397.71-Mb final assembly is composed of 12 contigs with a contig N50 size of 31.93 Mb. Each chromosome is represented by a single contig and the genomic sequences of all chromosomes are gapless. Quality evaluation of this gapless genome assembly showed that gene regions in our assembly have the highest completeness compared with the other 15 reported high-quality rice genomes. Further comparison with the japonica rice genome revealed that the gapless indica genome assembly contains more transposable elements (TEs) and segmental duplications (SDs), the latter of which produce many duplicated genes that can affect agronomic traits through dose effect or sub-/neo-functionalization. The insertion of TEs can also affect the expression of duplicated genes, which may drive the evolution of these genes. Furthermore, we found the expansion of nucleotide-binding site with leucine-rich repeat disease-resistance genes and cis-zeatin-O-glucosyltransferase growth-related genes in SDs in the gapless indica genome assembly, suggesting that SDs contribute to the adaptive evolution of rice disease resistance and developmental processes. Collectively, our findings suggest that active TEs and SDs synergistically contribute to rice genome evolution.
Collapse
Affiliation(s)
- Kui Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Wenkai Jiang
- Novogene Bioinformatics Institute, Building 301, Zone A10 Jiuxianqiao North Road, Chaoyang District, Beijing 100083, China
| | - Yuanyuan Hui
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Mengjuan Kong
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| | - Li-Ying Feng
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China
| | - Li-Zhi Gao
- Institution of Genomics and Bioinformatics, South China Agricultural University, Guangzhou 510642, China.
| | - Pengfu Li
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China.
| | - Shan Lu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China; Shenzhen Research Institute of Nanjing University, Shenzhen 518000, China.
| |
Collapse
|
29
|
Vervoort L, Dierckxsens N, Pereboom Z, Capozzi O, Rocchi M, Shaikh TH, Vermeesch JR. 22q11.2 Low Copy Repeats Expanded in the Human Lineage. Front Genet 2021; 12:706641. [PMID: 34335701 PMCID: PMC8320366 DOI: 10.3389/fgene.2021.706641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open
Abstract
Segmental duplications or low copy repeats (LCRs) constitute duplicated regions interspersed in the human genome, currently neglected in standard analyses due to their extreme complexity. Recent functional studies have indicated the potential of genes within LCRs in synaptogenesis, neuronal migration, and neocortical expansion in the human lineage. One of the regions with the highest proportion of duplicated sequence is the 22q11.2 locus, carrying eight LCRs (LCR22-A until LCR22-H), and rearrangements between them cause the 22q11.2 deletion syndrome. The LCR22-A block was recently reported to be hypervariable in the human population. It remains unknown whether this variability also exists in non-human primates, since research is strongly hampered by the presence of sequence gaps in the human and non-human primate reference genomes. To chart the LCR22 haplotypes and the associated inter- and intra-species variability, we de novo assembled the region in non-human primates by a combination of optical mapping techniques. A minimal and likely ancient haplotype is present in the chimpanzee, bonobo, and rhesus monkey without intra-species variation. In addition, the optical maps identified assembly errors and closed gaps in the orthologous chromosome 22 reference sequences. These findings indicate the LCR22 expansion to be unique to the human population, which might indicate involvement of the region in human evolution and adaptation. Those maps will enable LCR22-specific functional studies and investigate potential associations with the phenotypic variability in the 22q11.2 deletion syndrome.
Collapse
Affiliation(s)
| | | | - Zjef Pereboom
- Centre for Research and Conservation, Royal Zoological Society of Antwerp, Antwerp, Belgium
- Evolutionary Ecology Group, Department of Biology, Antwerp University, Antwerp, Belgium
| | | | | | - Tamim H. Shaikh
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, United States
| | | |
Collapse
|
30
|
Abdullaev ET, Umarova IR, Arndt PF. Modelling segmental duplications in the human genome. BMC Genomics 2021; 22:496. [PMID: 34215180 PMCID: PMC8254307 DOI: 10.1186/s12864-021-07789-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 06/10/2021] [Indexed: 11/22/2022] Open
Abstract
Background Segmental duplications (SDs) are long DNA sequences that are repeated in a genome and have high sequence identity. In contrast to repetitive elements they are often unique and only sometimes have multiple copies in a genome. There are several well-studied mechanisms responsible for segmental duplications: non-allelic homologous recombination, non-homologous end joining and replication slippage. Such duplications play an important role in evolution, however, we do not have a full understanding of the dynamic properties of the duplication process. Results We study segmental duplications through a graph representation where nodes represent genomic regions and edges represent duplications between them. The resulting network (the SD network) is quite complex and has distinct features which allow us to make inference on the evolution of segmantal duplications. We come up with the network growth model that explains features of the SD network thus giving us insights on dynamics of segmental duplications in the human genome. Based on our analysis of genomes of other species the network growth model seems to be applicable for multiple mammalian genomes. Conclusions Our analysis suggests that duplication rates of genomic loci grow linearly with the number of copies of a duplicated region. Several scenarios explaining such a preferential duplication rates were suggested. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07789-7).
Collapse
Affiliation(s)
- Eldar T Abdullaev
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63/73, Berlin, 14195, Germany.
| | - Iren R Umarova
- Faculty of Computational Mathematics and Cybernetics, Moscow State University, Leninskiye Gory 1-52, Moscow, 119991, Russia
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63/73, Berlin, 14195, Germany
| |
Collapse
|
31
|
Khorsand P, Denti L, Bonizzoni P, Chikhi R, Hormozdiari F. Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES 2021; 1:vbab005. [PMID: 36700094 PMCID: PMC9710709 DOI: 10.1093/bioadv/vbab005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Luca Denti
- Department of Computational Biology, Institut Pasteur, Paris 75015, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, 20126, Italy,To whom correspondence should be addressed. or or
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France,To whom correspondence should be addressed. or or
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA 95616, USA,UC Davis MIND Institute, Sacramento, CA 95817, USA,Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA 95817, USA,To whom correspondence should be addressed. or or
| |
Collapse
|
32
|
Dog10K_Boxer_Tasha_1.0: A Long-Read Assembly of the Dog Reference Genome. Genes (Basel) 2021; 12:genes12060847. [PMID: 34070911 PMCID: PMC8228171 DOI: 10.3390/genes12060847] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 05/21/2021] [Accepted: 05/27/2021] [Indexed: 12/27/2022] Open
Abstract
The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named "Tasha" initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good-quality draft, as with all assemblies produced at the time, it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein-coding genes. Here, we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies that increases sequence contiguity >100-fold, closes >23,000 gaps of the CanFam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.
Collapse
|
33
|
Zhong J, Tang C, Peng W, Xie M, Sun Y, Tang Q, Xiao Q, Yang J. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinformatics 2021; 22:248. [PMID: 33985429 PMCID: PMC8120700 DOI: 10.1186/s12859-021-04175-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 05/06/2021] [Indexed: 02/08/2023] Open
Abstract
Background Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. Results In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. Conclusions We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.
Collapse
Affiliation(s)
- Jiancheng Zhong
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Changsha, 410083, China
| | - Chao Tang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Wei Peng
- College of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
| | - Minzhu Xie
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Yusui Sun
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Qiang Tang
- College of Engineering and Design, Hunan Normal University, Changsha, 410081, China
| | - Qiu Xiao
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| | - Jiahong Yang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
34
|
Halo JV, Pendleton AL, Shen F, Doucet AJ, Derrien T, Hitte C, Kirby LE, Myers B, Sliwerska E, Emery S, Moran JV, Boyko AR, Kidd JM. Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes. Proc Natl Acad Sci U S A 2021; 118:e2016274118. [PMID: 33836575 PMCID: PMC7980453 DOI: 10.1073/pnas.2016274118] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3' end of LINE-1_Cfs (i.e., LINE-1_Cf 3'-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.
Collapse
Affiliation(s)
- Julia V Halo
- Department of Biological Sciences, Bowling Green State University, Bowling Green, OH 43403
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Amanda L Pendleton
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Feichen Shen
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Aurélien J Doucet
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
- Université Côte d'Azur, CNRS, INSERM, Institut de Recherche sur le Cancer et le Vieillissement de Nice, F-06100 Nice, France
| | - Thomas Derrien
- Université de Rennes 1, CNRS, Institut de Génétique et Développement de Rennes-UMR 6290, F-35000 Rennes, France
| | - Christophe Hitte
- Université de Rennes 1, CNRS, Institut de Génétique et Développement de Rennes-UMR 6290, F-35000 Rennes, France
| | - Laura E Kirby
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Bridget Myers
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Elzbieta Sliwerska
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - Sarah Emery
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
| | - John V Moran
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109
| | - Adam R Boyko
- Department of Biomedical Sciences, Cornell University, Ithaca, NY 14850
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109;
- Department Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
35
|
Guan J, Xu Y, Yu Y, Fu J, Ren F, Guo J, Zhao J, Jiang Q, Wei J, Xie H. Genome structure variation analyses of peach reveal population dynamics and a 1.67 Mb causal inversion for fruit shape. Genome Biol 2021; 22:13. [PMID: 33402202 PMCID: PMC7784018 DOI: 10.1186/s13059-020-02239-1] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 12/14/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Structural variations (SVs), a major resource of genomic variation, can have profound consequences on phenotypic variation, yet the impacts of SVs remain largely unexplored in crops. RESULTS Here, we generate a high-quality de novo genome assembly for a flat-fruit peach cultivar and produce a comprehensive SV map for peach, as a high proportion of genomic sequence is occupied by heterozygous SVs in the peach genome. We conduct population-level analyses that indicate SVs have undergone strong purifying selection during peach domestication, and find evidence of positive selection, with a significant preference for upstream and intronic regions during later peach improvement. We perform a SV-based GWAS that identifies a large 1.67-Mb heterozygous inversion that segregates perfectly with flat-fruit shape. Mechanistically, this derived allele alters the expression of the PpOFP2 gene positioned near the proximal breakpoint of the inversion, and we confirm in transgenic tomatoes that PpOFP2 is causal for flat-fruit shape. CONCLUSIONS Thus, beyond introducing new genomics resources for peach research, our study illustrates how focusing on SV data can drive basic functional discoveries in plant science.
Collapse
Affiliation(s)
- Jiantao Guan
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, People's Republic of China
| | - Yaoguang Xu
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, People's Republic of China
| | - Yang Yu
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, People's Republic of China
| | - Jun Fu
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, People's Republic of China
| | - Fei Ren
- Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
| | - Jiying Guo
- Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
| | - Jianbo Zhao
- Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China
| | - Quan Jiang
- Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China.
| | - Jianhua Wei
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China.
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, People's Republic of China.
| | - Hua Xie
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, People's Republic of China.
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing, People's Republic of China.
| |
Collapse
|
36
|
The structure, function and evolution of a complete human chromosome 8. Nature 2021; 593:101-107. [PMID: 33828295 PMCID: PMC8099727 DOI: 10.1038/s41586-021-03420-7] [Citation(s) in RCA: 169] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/04/2021] [Indexed: 02/07/2023]
Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Collapse
|
37
|
Uchiyama Y, Yamaguchi D, Iwama K, Miyatake S, Hamanaka K, Tsuchida N, Aoi H, Azuma Y, Itai T, Saida K, Fukuda H, Sekiguchi F, Sakaguchi T, Lei M, Ohori S, Sakamoto M, Kato M, Koike T, Takahashi Y, Tanda K, Hyodo Y, Honjo RS, Bertola DR, Kim CA, Goto M, Okazaki T, Yamada H, Maegaki Y, Osaka H, Ngu LH, Siew CG, Teik KW, Akasaka M, Doi H, Tanaka F, Goto T, Guo L, Ikegawa S, Haginoya K, Haniffa M, Hiraishi N, Hiraki Y, Ikemoto S, Daida A, Hamano SI, Miura M, Ishiyama A, Kawano O, Kondo A, Matsumoto H, Okamoto N, Okanishi T, Oyoshi Y, Takeshita E, Suzuki T, Ogawa Y, Handa H, Miyazono Y, Koshimizu E, Fujita A, Takata A, Miyake N, Mizuguchi T, Matsumoto N. Efficient detection of copy-number variations using exome data: Batch- and sex-based analyses. Hum Mutat 2020; 42:50-65. [PMID: 33131168 DOI: 10.1002/humu.24129] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 09/29/2020] [Accepted: 10/15/2020] [Indexed: 12/16/2022]
Abstract
Many algorithms to detect copy number variations (CNVs) using exome sequencing (ES) data have been reported and evaluated on their sensitivity and specificity, reproducibility, and precision. However, operational optimization of such algorithms for a better performance has not been fully addressed. ES of 1199 samples including 763 patients with different disease profiles was performed. ES data were analyzed to detect CNVs by both the eXome Hidden Markov Model (XHMM) and modified Nord's method. To efficiently detect rare CNVs, we aimed to decrease sequencing biases by analyzing, at the same time, the data of all unrelated samples sequenced in the same flow cell as a batch, and to eliminate sex effects of X-linked CNVs by analyzing female and male sequences separately. We also applied several filtering steps for more efficient CNV selection. The average number of CNVs detected in one sample was <5. This optimization together with targeted CNV analysis by Nord's method identified pathogenic/likely pathogenic CNVs in 34 patients (4.5%, 34/763). In particular, among 142 patients with epilepsy, the current protocol detected clinically relevant CNVs in 19 (13.4%) patients, whereas the previous protocol identified them in only 14 (9.9%) patients. Thus, this batch-based XHMM analysis efficiently selected rare pathogenic CNVs in genetic diseases.
Collapse
Affiliation(s)
- Yuri Uchiyama
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | | | - Kazuhiro Iwama
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Japan
| | - Kohei Hamanaka
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomi Tsuchida
- Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Aoi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiteru Azuma
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Toshiyuki Itai
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ken Saida
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Hiromi Fukuda
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Futoshi Sekiguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohiro Sakaguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Ming Lei
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Masamune Sakamoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Pediatrics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Mitsuhiro Kato
- Department of Pediatrics, Showa University School of Medicine, Tokyo, Japan
| | - Takayoshi Koike
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Yukitoshi Takahashi
- National Epilepsy Center, NHO Shizuoka Institute of Epilepsy and Neurological Disorders, Shizuoka, Japan
| | - Koichi Tanda
- Department of Pediatrics, Japanese Red Cross Kyoto Daiichi Hospital, Kyoto, Japan
| | - Yuki Hyodo
- Department of Child Neurology, Okayama University Hospital, Okayama, Japan
| | - Rachel S Honjo
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Debora Romeo Bertola
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Chong Ae Kim
- Unidade de Genetica do Instituto da Crianca do Hospital das Clinicas da Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, Brazil
| | - Masahide Goto
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Tetsuya Okazaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hiroyuki Yamada
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Yoshihiro Maegaki
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan
| | - Hitoshi Osaka
- Department of Pediatrics, Jichi Medical University, Shimotsuke, Japan
| | - Lock-Hock Ngu
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Ch'ng G Siew
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Keng W Teik
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Manami Akasaka
- Department of Pediatrics, Iwate Medical University School of Medicine, Morioka, Japan
| | - Hiroshi Doi
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Fumiaki Tanaka
- Department of Neurology and Stroke Medicine, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Tomohide Goto
- Division of Neurology, Kanagawa Children's Medical Center, Yokohama, Japan
| | - Long Guo
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Shiro Ikegawa
- Laboratory for Bone and Joint Diseases, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan
| | - Kazuhiro Haginoya
- Department of Pediatric Neurology, Miyagi Children's Hospital, Sendai, Japan
| | - Muzhirah Haniffa
- Department of Genetics, Kuala Lumpur Hospital, Kuala Lumpur, Malaysia
| | - Nozomi Hiraishi
- Department of Pediatrics, Yokohama City University Medical Center, Yokohama, Japan
| | - Yoko Hiraki
- Hiroshima Municipal Center for Child Health and Development, Hiroshima, Japan
| | - Satoru Ikemoto
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Atsuro Daida
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Shin-Ichiro Hamano
- Division of Neurology, Saitama Children's Medical Center, Saitama, Japan
| | - Masaki Miura
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan.,Department of Pediatrics, Nagaoka Red Cross Hospital, Nagaoka, Japan
| | - Akihiko Ishiyama
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Osamu Kawano
- Department of Pediatrics, Hokkaido University Hospital, Sapporo, Japan
| | - Akane Kondo
- Clinical Genetics Center, Shikoku Medical Center for Children and Adults, National Hospital Organization, Kagawa, Japan
| | - Hiroshi Matsumoto
- Department of Pediatrics, National Defense Medical College, Saitama, Japan
| | - Nobuhiko Okamoto
- Department of Medical Genetics, Osaka Women's and Children's Hospital, Osaka, Japan
| | - Tohru Okanishi
- Department of Brain and Neurosciences, Division of Child Neurology, Faculty of Medicine, Tottori University, Yonago, Japan.,Department of Child Neurology, Comprehensive Epilepsy Center, Seirei Hamamatsu General Hospital, Hamamatsu, Japan
| | - Yukimi Oyoshi
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Eri Takeshita
- Department of Child Neurology, National Center Hospital, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Toshifumi Suzuki
- Department of Obstetrics and Gynecology, Faculty of Medicine Juntendo University, Tokyo, Japan
| | - Yoshiyuki Ogawa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Hiroshi Handa
- Department of Hematology, Gunma University Graduate School of Medicine, Gunma, Japan
| | - Yayoi Miyazono
- Department of Child Health, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| | - Eriko Koshimizu
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Fujita
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Atsushi Takata
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Noriko Miyake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| |
Collapse
|
38
|
Criscuolo A. On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference. F1000Res 2020; 9:1309. [PMID: 33335719 PMCID: PMC7713896 DOI: 10.12688/f1000research.26930.1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/12/2020] [Indexed: 12/29/2022] Open
Abstract
Recently developed MinHash-based techniques were proven successful in quickly estimating the level of similarity between large nucleotide sequences. This article discusses their usage and limitations in practice to approximating uncorrected distances between genomes, and transforming these pairwise dissimilarities into proper evolutionary distances. It is notably shown that complex distance measures can be easily approximated using simple transformation formulae based on few parameters. MinHash-based techniques can therefore be very useful for implementing fast yet accurate alignment-free phylogenetic reconstruction procedures from large sets of genomes. This last point of view is assessed with a simulation study using a dedicated bioinformatics tool.
Collapse
Affiliation(s)
- Alexis Criscuolo
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, 75015 Paris, France
| |
Collapse
|
39
|
Feng C, Dai M, Liu Y, Chen M. Sequence repetitiveness quantification and de novo repeat detection by weighted k-mer coverage. Brief Bioinform 2020; 22:5855256. [PMID: 32591772 DOI: 10.1093/bib/bbaa086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 04/10/2020] [Accepted: 04/22/2020] [Indexed: 11/12/2022] Open
Abstract
DNA repeats are abundant in eukaryotic genomes and have been proved to play a vital role in genome evolution and regulation. A large number of approaches have been proposed to identify various repeats in the genome. Some de novo repeat identification tools can efficiently generate sequence repetitive scores based on k-mer counting for repeat detection. However, we noticed that these tools can still be improved in terms of repetitive score calculation, sensitivity to segmental duplications and detection specificity. Therefore, here, we present a new computational approach named Repeat Locator (RepLoc), which is based on weighted k-mer coverage to quantify the genome sequence repetitiveness and locate the repetitive sequences. According to the repetitiveness map of the human genome generated by RepLoc, we found that there may be relationships between sequence repetitiveness and genome structures. A comprehensive benchmark shows that RepLoc is a more efficient k-mer counting based tool for de novo repeat detection. The RepLoc software is freely available at http://bis.zju.edu.cn/reploc.
Collapse
Affiliation(s)
- Cong Feng
- Ming Chen's laboratory in Zhejiang University
| | - Min Dai
- Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences
| | | | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University
| |
Collapse
|
40
|
Li Q, Ramasamy S, Singh P, Hagel JM, Dunemann SM, Chen X, Chen R, Yu L, Tucker JE, Facchini PJ, Yeaman S. Gene clustering and copy number variation in alkaloid metabolic pathways of opium poppy. Nat Commun 2020; 11:1190. [PMID: 32132540 PMCID: PMC7055283 DOI: 10.1038/s41467-020-15040-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 02/10/2020] [Indexed: 02/08/2023] Open
Abstract
Genes in plant secondary metabolic pathways enable biosynthesis of a range of medically and industrially important compounds, and are often clustered on chromosomes. Here, we study genomic clustering in the benzylisoquinoline alkaloid (BIA) pathway in opium poppy (Papaver somniferum), exploring relationships between gene expression, copy number variation, and metabolite production. We use Hi-C to improve the existing draft genome assembly, yielding chromosome-scale scaffolds that include 35 previously unanchored BIA genes. We find that co-expression of BIA genes increases within clusters and identify candidates with unknown function based on clustering and covariation in expression and alkaloid production. Copy number variation in critical BIA genes correlates with stark differences in alkaloid production, linking noscapine production with an 11-gene deletion, and increased thebaine/decreased morphine production with deletion of a T6ODM cluster. Our results show that the opium poppy genome is still dynamically evolving in ways that contribute to medically and industrially important phenotypes.
Collapse
Affiliation(s)
- Qiushi Li
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Sukanya Ramasamy
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Pooja Singh
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Jillian M Hagel
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
- Willow Biosciences Inc., 3655 36 Street N.W., Calgary, Alberta, T2L 1Y8, Canada
| | - Sonja M Dunemann
- Department of Ecosystem and Public Health, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Xue Chen
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
- Willow Biosciences Inc., 3655 36 Street N.W., Calgary, Alberta, T2L 1Y8, Canada
| | - Rongji Chen
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Lisa Yu
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Joseph E Tucker
- Willow Biosciences Inc., 3655 36 Street N.W., Calgary, Alberta, T2L 1Y8, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
| | - Peter J Facchini
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada
- Willow Biosciences Inc., 3655 36 Street N.W., Calgary, Alberta, T2L 1Y8, Canada
| | - Sam Yeaman
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, T2N 1N4, Canada.
| |
Collapse
|
41
|
Liu Y, Zhang LY, Li J. Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers. Bioinformatics 2019; 35:4560-4567. [DOI: 10.1093/bioinformatics/btz273] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 03/31/2019] [Accepted: 04/11/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches.
Results
Under this framework, we proposed a new method to detect all MEMs from a pair of genomes. The method first performs a fixed sampling of k-mers on the query sequence, and adds these selected k-mers to a Bloom filter. Then all the k-mers of the reference sequence are tested by the Bloom filter. If a k-mer passes the test, it is inserted into a hash table for indexing. Compared with the existing methods, much less number of query k-mers are generated and much less k-mers are inserted into the index to avoid unnecessary matches, leading to an efficient matching process and memory usage savings. Experiments on large genomes demonstrate that our method is at least 1.8 times faster than the best of the existing algorithms. This performance is mainly attributed to the key novelty of our method that the fixed k-mer sampling must be conducted on the query sequence and the index k-mers are filtered from the reference sequence via a Bloom filter.
Availability and implementation
https://github.com/yuansliu/bfMEM
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuansheng Liu
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia
| | - Leo Yu Zhang
- School of Information Technology, Deakin University, VIC 3216, Australia
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia
| |
Collapse
|