1
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024; 630:401-411. [PMID: 38811727 PMCID: PMC11168930 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
3
|
Paparella A, L’Abbate A, Palmisano D, Chirico G, Porubsky D, Catacchio CR, Ventura M, Eichler EE, Maggiolini FAM, Antonacci F. Structural Variation Evolution at the 15q11-q13 Disease-Associated Locus. Int J Mol Sci 2023; 24:15818. [PMID: 37958807 PMCID: PMC10648317 DOI: 10.3390/ijms242115818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 10/26/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
The impact of segmental duplications on human evolution and disease is only just starting to unfold, thanks to advancements in sequencing technologies that allow for their discovery and precise genotyping. The 15q11-q13 locus is a hotspot of recurrent copy number variation associated with Prader-Willi/Angelman syndromes, developmental delay, autism, and epilepsy and is mediated by complex segmental duplications, many of which arose recently during evolution. To gain insight into the instability of this region, we characterized its architecture in human and nonhuman primates, reconstructing the evolutionary history of five different inversions that rearranged the region in different species primarily by accumulation of segmental duplications. Comparative analysis of human and nonhuman primate duplication structures suggests a human-specific gain of directly oriented duplications in the regions flanking the GOLGA cores and HERC segmental duplications, representing potential genomic drivers for the human-specific expansions. The increasing complexity of segmental duplication organization over the course of evolution underlies its association with human susceptibility to recurrent disease-associated rearrangements.
Collapse
Affiliation(s)
- Annalisa Paparella
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Alberto L’Abbate
- Institute of Biomembranes, Bioenergetics, and Molecular Biotechnology (IBIOM), 70125 Bari, Italy
| | - Donato Palmisano
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Gerardina Chirico
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Claudia R. Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute (HHMI), University of Washington, Seattle, WA 98195, USA
| | - Flavia A. M. Maggiolini
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
- Research Centre for Viticulture and Enology, Council for Agricultural Research and Economics (CREA), 70010 Bari, Italy
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70125 Bari, Italy
| |
Collapse
|
4
|
Poszewiecka B, Gogolewski K, Karolak JA, Stankiewicz P, Gambin A. PhaseDancer: a novel targeted assembler of segmental duplications unravels the complexity of the human chromosome 2 fusion going from 48 to 46 chromosomes in hominin evolution. Genome Biol 2023; 24:205. [PMID: 37697406 PMCID: PMC10496407 DOI: 10.1186/s13059-023-03022-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 07/25/2023] [Indexed: 09/13/2023] Open
Abstract
Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined evolutionary scenarios. PhaseDancer enables extension of the incomplete complex SD-rich subtelomeric regions of Great Ape chromosomes orthologous to the human chromosome 2 (HSA2) fusion site, informing a model of HSA2 formation and unravelling the evolution of human and Great Ape genomes.
Collapse
Affiliation(s)
- Barbara Poszewiecka
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Krzysztof Gogolewski
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Justyna A. Karolak
- Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, 77030 Houston, TX USA
- Chair and Department of Genetics and Pharmaceutical Microbiology, Poznan University of Medical Sciences, 60-806 Poznan, Poland
| | - Paweł Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, 77030 Houston, TX USA
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
5
|
Estimating the age of single nucleotide polymorphic sites in humans. Genes Genomics 2021; 43:1179-1188. [PMID: 34245420 DOI: 10.1007/s13258-021-01135-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 06/28/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND Determining the ages of polymorphic sites in human genomes needs to be carried out in a careful balance between the degree of complexity of the approach and the desired accuracy. OBJECTIVE We provide evidence that a simpler approach where age determination is based upon the degree to which the alternative allele is spread among the population can be competitive with more complex methods. METHODS The information contained in the vcf files of Phase 1 of the 1000 Genomes Project combined with the genomic sequences of seven non-human primate species was analyzed. The analyses were supplemented by a computer simulation of the mutational changes in 10,000 model chromosomes with a length of 10,000 nucleotides over a period of 16 million years. The list of the birth dates of the derived alleles of homozygous and heterozygous components of the derived alleles served as a yardstick to estimate the ages of human alternative alleles. RESULTS The age of the derived alleles born in Africa before the "Out of Africa" event and not brought to other continents are estimated to be 0.17 Ma, the derived alleles present simultaneously on all continents are estimated to be 1.3 Ma old and the age of alleles arising in Europe or Asia is 0.06 Ma. CONCLUSION Our approach functions with polymorphisms that respect the "more frequent means older" principle. However, this shortcoming only leads to disagreement with the Atlas of Variant Age in about 20% of cases.
Collapse
|
6
|
Mao Y, Catacchio CR, Hillier LW, Porubsky D, Li R, Sulovari A, Fernandes JD, Montinaro F, Gordon DS, Storer JM, Haukness M, Fiddes IT, Murali SC, Dishuck PC, Hsieh P, Harvey WT, Audano PA, Mercuri L, Piccolo I, Antonacci F, Munson KM, Lewis AP, Baker C, Underwood JG, Hoekzema K, Huang TH, Sorensen M, Walker JA, Hoffman J, Thibaud-Nissen F, Salama SR, Pang AWC, Lee J, Hastie AR, Paten B, Batzer MA, Diekhans M, Ventura M, Eichler EE. A high-quality bonobo genome refines the analysis of hominid evolution. Nature 2021; 594:77-81. [PMID: 33953399 PMCID: PMC8172381 DOI: 10.1038/s41586-021-03519-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 04/07/2021] [Indexed: 12/17/2022]
Abstract
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3–5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome. A high-quality bonobo genome assembly provides insights into incomplete lineage sorting in hominids and its relevance to gene evolution and the genetic relationship among living hominids.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jason D Fernandes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Francesco Montinaro
- Department of Biology, University of Bari, Bari, Italy.,Estonian Biocentre, Institute of Genomics, Tartu, Estonia
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Shwetha Canchi Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tzu-Hsueh Huang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jerilyn A Walker
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R Salama
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.,Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Bari, Italy.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
7
|
The structure, function and evolution of a complete human chromosome 8. Nature 2021; 593:101-107. [PMID: 33828295 PMCID: PMC8099727 DOI: 10.1038/s41586-021-03420-7] [Citation(s) in RCA: 169] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/04/2021] [Indexed: 02/07/2023]
Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Collapse
|
8
|
Li J, Fan Z, Shen F, Pendleton AL, Song Y, Xing J, Yue B, Kidd JM, Li J. Genomic Copy Number Variation Study of Nine Macaca Species Provides New Insights into Their Genetic Divergence, Adaptation, and Biomedical Application. Genome Biol Evol 2020; 12:2211-2230. [PMID: 32970804 PMCID: PMC7846157 DOI: 10.1093/gbe/evaa200] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2020] [Indexed: 02/06/2023] Open
Abstract
Copy number variation (CNV) can promote phenotypic diversification and adaptive evolution. However, the genomic architecture of CNVs among Macaca species remains scarcely reported, and the roles of CNVs in adaptation and evolution of macaques have not been well addressed. Here, we identified and characterized 1,479 genome-wide hetero-specific CNVs across nine Macaca species with bioinformatic methods, along with 26 CNV-dense regions and dozens of lineage-specific CNVs. The genes intersecting CNVs were overrepresented in nutritional metabolism, xenobiotics/drug metabolism, and immune-related pathways. Population-level transcriptome data showed that nearly 46% of CNV genes were differentially expressed across populations and also mainly consisted of metabolic and immune-related genes, which implied the role of CNVs in environmental adaptation of Macaca. Several CNVs overlapping drug metabolism genes were verified with genomic quantitative polymerase chain reaction, suggesting that these macaques may have different drug metabolism features. The CNV-dense regions, including 15 first reported here, represent unstable genomic segments in macaques where biological innovation may evolve. Twelve gains and 40 losses specific to the Barbary macaque contain genes with essential roles in energy homeostasis and immunity defense, inferring the genetic basis of its unique distribution in North Africa. Our study not only elucidated the genetic diversity across Macaca species from the perspective of structural variation but also provided suggestive evidence for the role of CNVs in adaptation and genome evolution. Additionally, our findings provide new insights into the application of diverse macaques to drug study.
Collapse
Affiliation(s)
- Jing Li
- Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
- Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Zhenxin Fan
- Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
- Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Feichen Shen
- Department of Human Genetics, Medical School, University of Michigan
| | | | - Yang Song
- Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jinchuan Xing
- Department of Genetics and the Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway
| | - Bisong Yue
- Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Jeffrey M Kidd
- Department of Human Genetics, Medical School, University of Michigan
| | - Jing Li
- Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
- Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
9
|
Single-cell strand sequencing of a macaque genome reveals multiple nested inversions and breakpoint reuse during primate evolution. Genome Res 2020; 30:1680-1693. [PMID: 33093070 PMCID: PMC7605249 DOI: 10.1101/gr.265322.120] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/02/2020] [Indexed: 12/14/2022]
Abstract
Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.
Collapse
|
10
|
Santesmasses D, Mariotti M, Gladyshev VN. Tolerance to Selenoprotein Loss Differs between Human and Mouse. Mol Biol Evol 2020; 37:341-354. [PMID: 31560400 PMCID: PMC6993852 DOI: 10.1093/molbev/msz218] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mouse has emerged as the most common model organism in biomedicine. Here, we analyzed the tolerance to the loss-of-function (LoF) of selenoprotein genes, estimated from mouse knockouts and the frequency of LoF variants in humans. We found not only a general correspondence in tolerance (e.g., GPX1, GPX2) and intolerance (TXNRD1, SELENOT) to gene LoF between humans and mice but also important differences. Notably, humans are intolerant to the loss of iodothyronine deiodinases, whereas their deletion in mice leads to mild phenotypes, and this is consistent with phenotype differences in selenocysteine machinery loss between these species. In contrast, loss of TXNRD2 and GPX4 is lethal in mice but may be tolerated in humans. We further identified the first human SELENOP variants coding for proteins varying in selenocysteine content. Finally, our analyses suggested that premature termination codons in selenoprotein genes trigger nonsense-mediated decay, but do this inefficiently when UGA codon is gained. Overall, our study highlights differences in the physiological importance of selenoproteins between human and mouse.
Collapse
Affiliation(s)
- Didac Santesmasses
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Marco Mariotti
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Vadim N Gladyshev
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
11
|
Recurrent inversion toggling and great ape genome evolution. Nat Genet 2020; 52:849-858. [PMID: 32541924 PMCID: PMC7415573 DOI: 10.1038/s41588-020-0646-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 05/15/2020] [Indexed: 01/14/2023]
Abstract
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Collapse
|
12
|
Hirai H, Hirai Y, Udono T, Matsubayashi K, Tosi AJ, Koga A. Structural variations of subterminal satellite blocks and their source mechanisms as inferred from the meiotic configurations of chimpanzee chromosome termini. Chromosome Res 2019; 27:321-332. [PMID: 31418128 DOI: 10.1007/s10577-019-09615-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/09/2019] [Accepted: 07/29/2019] [Indexed: 10/26/2022]
Abstract
African great apes have large constitutive heterochromatin (C-band) blocks in subtelomeric regions of the majority of their chromosomes, but humans lack these. Additionally, the chimpanzee meiotic cell division process demonstrates unique partial terminal associations in the first meiotic prophase (pachytene). These are likely formed as a result of interaction among subtelomeric C-band blocks. We thus conducted an extensive study to define the features in the subtelomeric heterochromatic regions of chimpanzee chromosomes undergoing mitotic metaphase and meiotic cell division. Molecular cytogenetic analyses with probes of both subterminal satellite DNA (a main component of C-band) and rDNA demonstrated principles of interaction among DNA arrays. The results suggest that homologous and ectopic recombination through persistent subtelomeric associations (post-bouquet association observed in 32% of spermatocytes in the pachytene stage) appears to create variability in heterochromatin patterns and simultaneously restrain subtelomeric genome polymorphisms. That is, the meeting of non-homologous chromosome termini sets the stage for ectopic pairing which, in turn, is the mechanism for generating variability and genomic dispersion of subtelomeric C-band blocks through a system of concerted evolution. Comparison between the present study and previous reports indicated that the chromosomal distribution rate of sutelomeric regions seems to have antagonistic correlation with arm numbers holding subterminal satellite blocks in humans, chimpanzees, and gorillas. That is, the increase of subterminal satellite blocks probably reduces genomic diversity in the subtelomeric regions. The acquisition vs. loss of the subtelomeric C-band blocks is postulated as the underlying engine of this chromosomal differentiation yielded by meiotic chromosomal interaction.
Collapse
Affiliation(s)
- Hirohisa Hirai
- Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan. .,The Unit of Human-Nature Interlaced Life Science, Kyoto University Research Coordination Alliance, Kyoto, Japan.
| | - Yuriko Hirai
- Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Toshifumi Udono
- Kumamoto Sanctuary, Wildlife Research Center, Kyoto University, Uto, Kumamoto, Japan
| | | | - Anthony J Tosi
- Department of Anthropology and School of Biomedical Science, Kent State University, Kent, OH, 44242, USA
| | - Akihiko Koga
- Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| |
Collapse
|
13
|
Kahveci F, Alkan C. Whole-Genome Shotgun Sequence CNV Detection Using Read Depth. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2019; 1833:61-72. [PMID: 30039363 DOI: 10.1007/978-1-4939-8666-8_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
With the developments in high-throughput sequencing (HTS) technologies, researchers have gained a powerful tool to identify structural variants (SVs) in genomes with substantially less cost than before. SVs can be broadly classified into two main categories: balanced rearrangements and copy number variations (CNVs). Many algorithms have been developed to characterize CNVs using HTS data, with focus on different types and size range of variants using different read signatures. Read depth (RD) based tools are more common in characterizing large (>10 kb) CNVs since RD strategy does not rely on the fragment size and read length, which are limiting factors in read pair and split read analysis. Here we provide a guideline for a user friendly tool for detecting large segmental duplications and deletions that can also predict integer copy numbers for duplicated genes.
Collapse
Affiliation(s)
- Fatma Kahveci
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, Turkey.
| |
Collapse
|
14
|
Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS, Underwood JG, Nelson BJ, Chaisson MJP, Dougherty ML, Munson KM, Hastie AR, Diekhans M, Hormozdiari F, Lorusso N, Hoekzema K, Qiu R, Clark K, Raja A, Welch AE, Sorensen M, Baker C, Fulton RS, Armstrong J, Graves-Lindsay TA, Denli AM, Hoppe ER, Hsieh P, Hill CM, Pang AWC, Lee J, Lam ET, Dutcher SK, Gage FH, Warren WC, Shendure J, Haussler D, Schneider VA, Cao H, Ventura M, Wilson RK, Paten B, Pollen A, Eichler EE. High-resolution comparative analysis of great ape genomes. Science 2018; 360:eaar6343. [PMID: 29880660 PMCID: PMC6178954 DOI: 10.1126/science.aar6343] [Citation(s) in RCA: 231] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 04/02/2018] [Indexed: 12/22/2022]
Abstract
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
Collapse
Affiliation(s)
- Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Shwetha Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Olivia S Meyerson
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Jason G Underwood
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Pacific Biosciences (PacBio) of California, Inc., Menlo Park, CA 94025, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Computational Biology and Bioinformatics, University of Southern California, Los Angeles, CA 90089, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, CA 95817, USA
| | - Nicola Lorusso
- Department of Biology, University of Bari, Aldo Moro, Bari 70121, Italy
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Robert S Fulton
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Tina A Graves-Lindsay
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Ahmet M Denli
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Emma R Hoppe
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Christopher M Hill
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | | | - Susan K Dutcher
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Fred H Gage
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Wesley C Warren
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Han Cao
- Bionano Genomics, San Diego, CA 92121, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Aldo Moro, Bari 70121, Italy
| | - Richard K Wilson
- Departments of Medicine and Genetics, McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alex Pollen
- Department of Neurology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
15
|
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes. ENTROPY 2018; 20:e20060393. [PMID: 33265483 PMCID: PMC7512912 DOI: 10.3390/e20060393] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/16/2018] [Accepted: 05/21/2018] [Indexed: 11/26/2022]
Abstract
An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA.
Collapse
|
16
|
Catacchio CR, Maggiolini FAM, D'Addabbo P, Bitonto M, Capozzi O, Lepore Signorile M, Miroballo M, Archidiacono N, Eichler EE, Ventura M, Antonacci F. Inversion variants in human and primate genomes. Genome Res 2018; 28:910-920. [PMID: 29776991 PMCID: PMC5991517 DOI: 10.1101/gr.234831.118] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 04/26/2018] [Indexed: 02/06/2023]
Abstract
For many years, inversions have been proposed to be a direct driving force in speciation since they suppress recombination when heterozygous. Inversions are the most common large-scale differences among humans and great apes. Nevertheless, they represent large events easily distinguishable by classical cytogenetics, whose resolution, however, is limited. Here, we performed a genome-wide comparison between human, great ape, and macaque genomes using the net alignments for the most recent releases of genome assemblies. We identified a total of 156 putative inversions, between 103 kb and 91 Mb, corresponding to 136 human loci. Combining literature, sequence, and experimental analyses, we analyzed 109 of these loci and found 67 regions inverted in one or multiple primates, including 28 newly identified inversions. These events overlap with 81 human genes at their breakpoints, and seven correspond to sites of recurrent rearrangements associated with human disease. This work doubles the number of validated primate inversions larger than 100 kb, beyond what was previously documented. We identified 74 sites of errors, where the sequence has been assembled in the wrong orientation, in the reference genomes analyzed. Our data serve two purposes: First, we generated a map of evolutionary inversions in these genomes representing a resource for interrogating differences among these species at a functional level; second, we provide a list of misassembled regions in these primate genomes, involving over 300 Mb of DNA and 1978 human genes. Accurately annotating these regions in the genome references has immediate applications for evolutionary and biomedical studies on primates.
Collapse
Affiliation(s)
| | | | - Pietro D'Addabbo
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| | - Miriana Bitonto
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| | - Oronzo Capozzi
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| | | | - Mattia Miroballo
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Mario Ventura
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| | - Francesca Antonacci
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| |
Collapse
|
17
|
Evolution and genomics of the human brain. NEUROLOGÍA (ENGLISH EDITION) 2018. [DOI: 10.1016/j.nrleng.2015.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
18
|
Regulation and function of avian selenogenome. Biochim Biophys Acta Gen Subj 2018; 1862:2473-2479. [PMID: 29627451 DOI: 10.1016/j.bbagen.2018.03.029] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/27/2018] [Accepted: 03/29/2018] [Indexed: 12/11/2022]
Abstract
BACKGROUND Selenium (Se) is an essential micronutrient required by avian species. Dietary Se/vitamin E deficiency induces three classical diseases in chicks: exudative diathesis, nutritional pancreatic atrophy, and nutritional muscular dystrophy. SCOPE OF REVIEW This review is to summarize and analyze the evolution, regulation, and function of avian selenogenome and selenoproteome and their relationship with the three classical Se/vitamin E deficiency diseases. MAJOR CONCLUSIONS There are 24 selenoproteins confirmed in chicks, with two avian-specific members (SELENOU and SELENOP2) and two missing mammalian members (GPX6 and SELENOV). There are two forms of SELENOP containing 1 or 13 selenocysteine residues. In addition, a Gallus gallus gene was conjectured to be the counterpart of the human SEPHS2. Expression of selenoprotein genes in the liver, pancreas, and muscle of chicks seemed to be highly responsive to dietary Se changes. Pathogeneses of the Se/vitamin E deficient diseases in the chicks were likely produced by missing functions of selected selenoproteins in regulating cellular and tissue redox balance and inhibiting oxidative/reductive stress-induced cell death. GENERAL SIGNIFICANCE Gene knockout models, similar to those of rodents, will help characterize the precise functions of avian selenoproteins and their comparisons with those of mammalian species.
Collapse
|
19
|
Eslami Rasekh M, Chiatante G, Miroballo M, Tang J, Ventura M, Amemiya CT, Eichler EE, Antonacci F, Alkan C. Discovery of large genomic inversions using long range information. BMC Genomics 2017; 18:65. [PMID: 28073353 PMCID: PMC5223412 DOI: 10.1186/s12864-016-3444-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 12/19/2016] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies. RESULTS Here we propose a novel algorithm, VALOR, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of VALOR using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of VALOR against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data. CONCLUSIONS In this paper, we show that VALOR is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using VALOR, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. VALOR is available at https://github.com/BilkentCompGen/VALOR.
Collapse
Affiliation(s)
- Marzieh Eslami Rasekh
- Department of Computer Engineering, Bilkent University, Bilkent, 06800, Ankara, Turkey
| | - Giorgia Chiatante
- Department of Biology, University of Bari, Via Orabona 4, 70125, Bari, Italy
| | - Mattia Miroballo
- Department of Biology, University of Bari, Via Orabona 4, 70125, Bari, Italy
| | - Joyce Tang
- Benaroya Research Institute, 1201 Ninth Avenue, 98101, Seattle, WA, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Via Orabona 4, 70125, Bari, Italy
| | - Chris T Amemiya
- Benaroya Research Institute, 1201 Ninth Avenue, 98101, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington, 3720 15th Avenue NE, 98195, Seattle, WA, USA
| | - Francesca Antonacci
- Department of Biology, University of Bari, Via Orabona 4, 70125, Bari, Italy.
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Bilkent, 06800, Ankara, Turkey.
| |
Collapse
|
20
|
Sotero-Caio CG, Platt RN, Suh A, Ray DA. Evolution and Diversity of Transposable Elements in Vertebrate Genomes. Genome Biol Evol 2017; 9:161-177. [PMID: 28158585 PMCID: PMC5381603 DOI: 10.1093/gbe/evw264] [Citation(s) in RCA: 147] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2016] [Indexed: 12/21/2022] Open
Abstract
Transposable elements (TEs) are selfish genetic elements that mobilize in genomes via transposition or retrotransposition and often make up large fractions of vertebrate genomes. Here, we review the current understanding of vertebrate TE diversity and evolution in the context of recent advances in genome sequencing and assembly techniques. TEs make up 4-60% of assembled vertebrate genomes, and deeply branching lineages such as ray-finned fishes and amphibians generally exhibit a higher TE diversity than the more recent radiations of birds and mammals. Furthermore, the list of taxa with exceptional TE landscapes is growing. We emphasize that the current bottleneck in genome analyses lies in the proper annotation of TEs and provide examples where superficial analyses led to misleading conclusions about genome evolution. Finally, recent advances in long-read sequencing will soon permit access to TE-rich genomic regions that previously resisted assembly including the gigantic, TE-rich genomes of salamanders and lungfishes.
Collapse
Affiliation(s)
| | - Roy N. Platt
- Department of Biological Sciences, Texas Tech University, Lubbock, TX
| | - Alexander Suh
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
| | - David A. Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX
| |
Collapse
|
21
|
Cardone MF, D'Addabbo P, Alkan C, Bergamini C, Catacchio CR, Anaclerio F, Chiatante G, Marra A, Giannuzzi G, Perniola R, Ventura M, Antonacci D. Inter-varietal structural variation in grapevine genomes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016; 88:648-661. [PMID: 27419916 DOI: 10.1111/tpj.13274] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 07/12/2016] [Accepted: 07/13/2016] [Indexed: 05/10/2023]
Abstract
Grapevine (Vitis vinifera L.) is one of the world's most important crop plants, which is of large economic value for fruit and wine production. There is much interest in identifying genomic variations and their functional effects on inter-varietal, phenotypic differences. Using an approach developed for the analysis of human and mammalian genomes, which combines high-throughput sequencing, array comparative genomic hybridization, fluorescent in situ hybridization and quantitative PCR, we created an inter-varietal atlas of structural variations and single nucleotide variants (SNVs) for the grapevine genome analyzing four economically and genetically relevant table grapevine varieties. We found 4.8 million SNVs and detected 8% of the grapevine genome to be affected by genomic variations. We identified more than 700 copy number variation (CNV) regions and more than 2000 genes subjected to CNV as potential candidates for phenotypic differences between varieties.
Collapse
Affiliation(s)
- Maria Francesca Cardone
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CREA)-Unità di ricerca per l'uva da tavola e la vitivinicoltura in ambiente mediterraneo, Research Unit for viticulture and enology in Southern Italy, Turi (BA), Italy
| | - Pietro D'Addabbo
- Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Ankara, TR-06800, Turkey
| | - Carlo Bergamini
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CREA)-Unità di ricerca per l'uva da tavola e la vitivinicoltura in ambiente mediterraneo, Research Unit for viticulture and enology in Southern Italy, Turi (BA), Italy
| | | | - Fabio Anaclerio
- Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
| | - Giorgia Chiatante
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CREA)-Unità di ricerca per l'uva da tavola e la vitivinicoltura in ambiente mediterraneo, Research Unit for viticulture and enology in Southern Italy, Turi (BA), Italy
- Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
| | - Annamaria Marra
- Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
| | - Giuliana Giannuzzi
- Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rocco Perniola
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CREA)-Unità di ricerca per l'uva da tavola e la vitivinicoltura in ambiente mediterraneo, Research Unit for viticulture and enology in Southern Italy, Turi (BA), Italy
| | - Mario Ventura
- Dipartimento di Biologia, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
| | - Donato Antonacci
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CREA)-Unità di ricerca per l'uva da tavola e la vitivinicoltura in ambiente mediterraneo, Research Unit for viticulture and enology in Southern Italy, Turi (BA), Italy
| |
Collapse
|
22
|
Mohajeri K, Cantsilieris S, Huddleston J, Nelson BJ, Coe BP, Campbell CD, Baker C, Harshman L, Munson KM, Kronenberg ZN, Kremitzki M, Raja A, Catacchio CR, Graves TA, Wilson RK, Ventura M, Eichler EE. Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region. Genome Res 2016; 26:1453-1467. [PMID: 27803192 PMCID: PMC5088589 DOI: 10.1101/gr.211284.116] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 09/12/2016] [Indexed: 12/13/2022]
Abstract
Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10−5). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.
Collapse
Affiliation(s)
- Kiana Mohajeri
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Bradley P Coe
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Catarina D Campbell
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Lana Harshman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Milinn Kremitzki
- The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | | | - Tina A Graves
- The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Richard K Wilson
- The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Mario Ventura
- Dipartimento di Biologia, Università degli Studi di Bari Aldo Moro, Bari 70125, Italy
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
23
|
Gordon D, Huddleston J, Chaisson MJP, Hill CM, Kronenberg ZN, Munson KM, Malig M, Raja A, Fiddes I, Hillier LW, Dunn C, Baker C, Armstrong J, Diekhans M, Paten B, Shendure J, Wilson RK, Haussler D, Chin CS, Eichler EE. Long-read sequence assembly of the gorilla genome. Science 2016; 352:aae0344. [PMID: 27034376 PMCID: PMC4920363 DOI: 10.1126/science.aae0344] [Citation(s) in RCA: 223] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/26/2016] [Indexed: 12/24/2022]
Abstract
Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.
Collapse
Affiliation(s)
- David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Christopher M Hill
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Maika Malig
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Ian Fiddes
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - LaDeana W Hillier
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Joel Armstrong
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Richard K Wilson
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Chen-Shan Chin
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
24
|
Ghenu AH, Bolker BM, Melnick DJ, Evans BJ. Multicopy gene family evolution on primate Y chromosomes. BMC Genomics 2016; 17:157. [PMID: 26925773 PMCID: PMC4772468 DOI: 10.1186/s12864-015-2187-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 11/02/2015] [Indexed: 12/12/2022] Open
Abstract
Background The primate Y chromosome is distinguished by a lack of inter-chromosomal recombination along most of its length, extensive gene loss, and a prevalence of repetitive elements. A group of genes on the male-specific portion of the Y chromosome known as the “ampliconic genes” are present in multiple copies that are sometimes part of palindromes, and that undergo a form of intra-chromosomal recombination called gene conversion, wherein the nucleotides of one copy are homogenized by those of another. With the aim of further understanding gene family evolution of these genes, we collected nucleotide sequence and gene copy number information for several species of papionin monkey. We then tested for evidence of gene conversion, and developed a novel statistical framework to evaluate alternative models of gene family evolution using our data combined with other information from a human, a chimpanzee, and a rhesus macaque. Results Our results (i) recovered evidence for several novel examples of gene conversion in papionin monkeys and indicate that (ii) ampliconic gene families evolve faster than autosomal gene families and than single-copy genes on the Y chromosome and that (iii) Y-linked singleton and autosomal gene families evolved faster in humans and chimps than they do in the other Old World Monkey lineages we studied. Conclusions Rapid evolution of ampliconic genes cannot be attributed solely to residence on the Y chromosome, nor to variation between primate lineages in the rate of gene family evolution. Instead other factors, such as natural selection and gene conversion, appear to play a role in driving temporal and genomic evolutionary heterogeneity in primate gene families. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2187-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ana-Hermina Ghenu
- Biology Department, McMaster University, 1280 Main Street West, Hamilton, L8S 4K1, Canada.
| | - Benjamin M Bolker
- Biology Department, McMaster University, 1280 Main Street West, Hamilton, L8S 4K1, Canada.,Department of Mathematics & Statistics, McMaster University, 1280 Main Street West, Hamilton, L8S 4K1, Canada
| | - Don J Melnick
- Department of Ecology, Evolution, and Environmental Biology, Columbia University, 10th Floor Schermerhorn Extension, New York, 10027, USA
| | - Ben J Evans
- Biology Department, McMaster University, 1280 Main Street West, Hamilton, L8S 4K1, Canada.
| |
Collapse
|
25
|
Abstract
Mammalian genomes harbor autonomous retrotransposons coding for the proteins required for their own mobilization, and nonautonomous retrotransposons, such as the human SVA element, which are transcribed but do not have any coding capacity. Mobilization of nonautonomous retrotransposons depends on the recruitment of the protein machinery encoded by autonomous retrotransposons. Here, we summarize the experimental details of SVA trans-mobilization assays which address multiple questions regarding the biology of both nonautonomous SVA elements and autonomous LINE-1 (L1) retrotransposons. The assay evaluates if and to what extent a noncoding SVA element is mobilized in trans by the L1-encoded protein machinery, the structural organization of the resulting marked de novo insertions, if they mimic endogenous SVA insertions and what the roles of individual domains of the nonautonomous retrotransposon for SVA mobilization are. Furthermore, the highly sensitive trans-mobilization assay can be used to verify the presence of otherwise barely detectable endogenously expressed functional L1 proteins via their marked SVA trans-mobilizing activity.
Collapse
Affiliation(s)
- Anja Bock
- Division of Medical Biotechnology, Paul-Ehrlich-Institut, Paul-Ehrlich-Strasse 51-59, 63225, Langen, Germany
| | - Gerald G Schumann
- Division of Medical Biotechnology, Paul-Ehrlich-Institut, Paul-Ehrlich-Strasse 51-59, 63225, Langen, Germany.
| |
Collapse
|
26
|
Xin H, Nahar S, Zhu R, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O. Optimal seed solver: optimizing seed selection in read mapping. ACTA ACUST UNITED AC 2015; 32:1632-42. [PMID: 26568624 DOI: 10.1093/bioinformatics/btv670] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 11/09/2015] [Indexed: 11/12/2022]
Abstract
MOTIVATION Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds. RESULTS We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in [Formula: see text] operations on average and in [Formula: see text] operations in the worst case, while generating a maximum of [Formula: see text] seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations. AVAILABILITY AND IMPLEMENTATION We provide an implementation of the Optimal Seed Solver in C++ at: https://github.com/CMU-SAFARI/Optimal-Seed-Solver CONTACT hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - John Emmons
- Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | | | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey and
| | - Onur Mutlu
- Computer Science Department, Department of Electrical and Computer Engineering
| |
Collapse
|
27
|
Catacchio CR, Ragone R, Chiatante G, Ventura M. Organization and evolution of Gorilla centromeric DNA from old strategies to new approaches. Sci Rep 2015; 5:14189. [PMID: 26387916 PMCID: PMC4585704 DOI: 10.1038/srep14189] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 08/18/2015] [Indexed: 11/09/2022] Open
Abstract
The centromere/kinetochore interaction is responsible for the pairing and segregation of replicated chromosomes in eukaryotes. Centromere DNA is portrayed as scarcely conserved, repetitive in nature, quickly evolving and protein-binding competent. Among primates, the major class of centromeric DNA is the pancentromeric α-satellite, made of arrays of 171 bp monomers, repeated in a head-to-tail pattern. α-satellite sequences can either form tandem heterogeneous monomeric arrays or assemble in higher-order repeats (HORs). Gorilla centromere DNA has barely been characterized, and data are mainly based on hybridizations of human alphoid sequences. We isolated and finely characterized gorilla α-satellite sequences and revealed relevant structure and chromosomal distribution similarities with other great apes as well as gorilla-specific features, such as the uniquely octameric structure of the suprachromosomal family-2 (SF2). We demonstrated for the first time the orthologous localization of alphoid suprachromosomal families-1 and −2 (SF1 and SF2) between human and gorilla in contrast to chimpanzee centromeres. Finally, the discovery of a new 189 bp monomer type in gorilla centromeres unravels clues to the role of the centromere protein B, paving the way to solve the significance of the centromere DNA’s essential repetitive nature in association with its function and the peculiar evolution of the α-satellite sequence.
Collapse
Affiliation(s)
- C R Catacchio
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| | - R Ragone
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| | - G Chiatante
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| | - M Ventura
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| |
Collapse
|
28
|
Rosales-Reynoso MA, Juárez-Vázquez CI, Barros-Núñez P. Evolution and genomics of the human brain. Neurologia 2015; 33:254-265. [PMID: 26304653 DOI: 10.1016/j.nrl.2015.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 06/01/2015] [Indexed: 01/20/2023] Open
Abstract
Most living beings are able to perform actions that can be considered intelligent or, at the very least, the result of an appropriate reaction to changing circumstances in their environment. However, the intelligence or intellectual processes of humans are vastly superior to those achieved by all other species. The adult human brain is a highly complex organ weighing approximately 1500g, which accounts for only 2% of the total body weight but consumes an amount of energy equal to that required by all skeletal muscle at rest. Although the human brain displays a typical primate structure, it can be identified by its specific distinguishing features. The process of evolution and humanisation of the Homo sapiens brain resulted in a unique and distinct organ with the largest relative volume of any animal species. It also permitted structural reorganization of tissues and circuits in specific segments and regions. These steps explain the remarkable cognitive abilities of modern humans compared not only with other species in our genus, but also with older members of our own species. Brain evolution required the coexistence of two adaptation mechanisms. The first involves genetic changes that occur at the species level, and the second occurs at the individual level and involves changes in chromatin organisation or epigenetic changes. The genetic mechanisms include: a) genetic changes in coding regions that lead to changes in the sequence and activity of existing proteins; b) duplication and deletion of previously existing genes; c) changes in gene expression through changes in the regulatory sequences of different genes; and d) synthesis of non-coding RNAs. Lastly, this review describes some of the main documented chromosomal differences between humans and great apes. These differences have also contributed to the evolution and humanisation process of the H. sapiens brain.
Collapse
Affiliation(s)
- M A Rosales-Reynoso
- División de Medicina Molecular, Centro de Investigación Biomédica de Occidente, Instituto Mexicano del Seguro Social, Guadalajara, Jalisco, México
| | - C I Juárez-Vázquez
- División de Medicina Molecular, Centro de Investigación Biomédica de Occidente, Instituto Mexicano del Seguro Social, Guadalajara, Jalisco, México
| | - P Barros-Núñez
- División de Genética, Centro de Investigación Biomédica de Occidente, Instituto Mexicano del Seguro Social, Guadalajara, Jalisco, México.
| |
Collapse
|
29
|
Abstract
The world of primate genomics is expanding rapidly in new and exciting ways owing to lowered costs and new technologies in molecular methods and bioinformatics. The primate order is composed of 78 genera and 478 species, including human. Taxonomic inferences are complex and likely a consequence of ongoing hybridization, introgression, and reticulate evolution among closely related taxa. Recently, we applied large-scale sequencing methods and extensive taxon sampling to generate a highly resolved phylogeny that affirms, reforms, and extends previous depictions of primate speciation. The next stage of research uses this phylogeny as a foundation for investigating genome content, structure, and evolution across primates. Ongoing and future applications of a robust primate phylogeny are discussed, highlighting advancements in adaptive evolution of genes and genomes, taxonomy and conservation management of endangered species, next-generation genomic technologies, and biomedicine.
Collapse
Affiliation(s)
- Jill Pecon-Slattery
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland 21702; Current Affiliation: Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia 22630;
| |
Collapse
|
30
|
Twyford AD, Streisfeld MA, Lowry DB, Friedman J. Genomic studies on the nature of species: adaptation and speciation inMimulus. Mol Ecol 2015; 24:2601-9. [DOI: 10.1111/mec.13190] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Revised: 03/25/2015] [Accepted: 03/27/2015] [Indexed: 12/27/2022]
Affiliation(s)
- Alex D. Twyford
- Ashworth Laboratories; Institute of Evolutionary Biology; The University of Edinburgh; Charlotte Auerbach Road Edinburgh EH9 3FL UK
- Department of Biology; Syracuse University; 107 College Place Syracuse NY 13244 USA
| | | | - David B. Lowry
- Plant Biology Laboratories; Department of Plant Biology; Michigan State University; 612 Wilson Road Room 166 East Lansing MI 48824 USA
| | - Jannice Friedman
- Department of Biology; Syracuse University; 107 College Place Syracuse NY 13244 USA
| |
Collapse
|
31
|
Reno PL. Genetic and developmental basis for parallel evolution and its significance for hominoid evolution. Evol Anthropol 2015; 23:188-200. [PMID: 25347977 DOI: 10.1002/evan.21417] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Greater understanding of ape comparative anatomy and evolutionary history has brought a general appreciation that the hominoid radiation is characterized by substantial homoplasy.(1-4) However, little consensus has been reached regarding which features result from repeated evolution. This has important implications for reconstructing ancestral states throughout hominoid evolution, including the nature of the Pan-Homo last common ancestor (LCA). Advances from evolutionary developmental biology (evo-devo) have expanded the diversity of model organisms available for uncovering the morphogenetic mechanisms underlying instances of repeated phenotypic change. Of particular relevance to hominoids are data from adaptive radiations of birds, fish, and even flies demonstrating that parallel phenotypic changes often use similar genetic and developmental mechanisms. The frequent reuse of a limited set of genes and pathways underlying phenotypic homoplasy suggests that the conserved nature of the genetic and developmental architecture of animals can influence evolutionary outcomes. Such biases are particularly likely to be shared by closely related taxa that reside in similar ecological niches and face common selective pressures. Consideration of these developmental and ecological factors provides a strong theoretical justification for the substantial homoplasy observed in the evolution of complex characters and the remarkable parallel similarities that can occur in closely related taxa. Thus, as in other branches of the hominoid radiation, repeated phenotypic evolution within African apes is also a distinct possibility. If so, the availability of complete genomes for each of the hominoid genera makes them another model to explore the genetic basis of repeated evolution.
Collapse
Affiliation(s)
- Philip L Reno
- Department of Anthropology, The Pennsylvania State University, University Park, PA, 16802
| |
Collapse
|
32
|
Leung WY, Marschall T, Paudel Y, Falquet L, Mei H, Schönhuth A, Maoz Moss TY. SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines. BMC Genomics 2015; 16:238. [PMID: 25887570 PMCID: PMC4520269 DOI: 10.1186/s12864-015-1376-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Accepted: 02/21/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a) Creating an automated, standardized pipeline for SV prediction. b) Identifying the best tool(s) for SV prediction through benchmarking. c) Providing a statistically sound method for merging SV calls. RESULTS The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers.
Collapse
Affiliation(s)
- Wai Yi Leung
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, The Netherlands.
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany. .,Max Planck Institute for Informatics, Saarbrücken, Germany. .,Centrum Wiskunde and Informatica, Amsterdam, The Netherlands.
| | - Yogesh Paudel
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, The Netherlands.
| | - Laurent Falquet
- University of Fribourg and Swiss Institute of Bioinformatics, Fribourg, Switzerland.
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | |
Collapse
|
33
|
Xin H, Greth J, Emmons J, Pekhimenko G, Kingsford C, Alkan C, Mutlu O. Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. ACTA ACUST UNITED AC 2015; 31:1553-60. [PMID: 25577434 DOI: 10.1093/bioinformatics/btu856] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 12/23/2014] [Indexed: 11/13/2022]
Abstract
MOTIVATION Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used. RESULTS We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers's bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification.
Collapse
Affiliation(s)
- Hongyi Xin
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| | - John Greth
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| | - John Emmons
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| | - Gennady Pekhimenko
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| | - Carl Kingsford
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| | - Can Alkan
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| | - Onur Mutlu
- Computer Science Department, Department of Electrical and Computer Engineering, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey
| |
Collapse
|
34
|
Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform 2014; 16:852-64. [PMID: 25504367 DOI: 10.1093/bib/bbu047] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Indexed: 01/01/2023] Open
Abstract
From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations (callers) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.
Collapse
|
35
|
Lee HE, Eo J, Kim HS. Composition and evolutionary importance of transposable elements in humans and primates. Genes Genomics 2014. [DOI: 10.1007/s13258-014-0249-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
36
|
Labunskyy VM, Hatfield DL, Gladyshev VN. Selenoproteins: molecular pathways and physiological roles. Physiol Rev 2014; 94:739-77. [PMID: 24987004 DOI: 10.1152/physrev.00039.2013] [Citation(s) in RCA: 812] [Impact Index Per Article: 81.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Selenium is an essential micronutrient with important functions in human health and relevance to several pathophysiological conditions. The biological effects of selenium are largely mediated by selenium-containing proteins (selenoproteins) that are present in all three domains of life. Although selenoproteins represent diverse molecular pathways and biological functions, all these proteins contain at least one selenocysteine (Sec), a selenium-containing amino acid, and most serve oxidoreductase functions. Sec is cotranslationally inserted into nascent polypeptide chains in response to the UGA codon, whose normal function is to terminate translation. To decode UGA as Sec, organisms evolved the Sec insertion machinery that allows incorporation of this amino acid at specific UGA codons in a process requiring a cis-acting Sec insertion sequence (SECIS) element. Although the basic mechanisms of Sec synthesis and insertion into proteins in both prokaryotes and eukaryotes have been studied in great detail, the identity and functions of many selenoproteins remain largely unknown. In the last decade, there has been significant progress in characterizing selenoproteins and selenoproteomes and understanding their physiological functions. We discuss current knowledge about how these unique proteins perform their functions at the molecular level and highlight new insights into the roles that selenoproteins play in human health.
Collapse
Affiliation(s)
- Vyacheslav M Labunskyy
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts; and Molecular Biology of Selenium Section, Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Dolph L Hatfield
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts; and Molecular Biology of Selenium Section, Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Vadim N Gladyshev
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts; and Molecular Biology of Selenium Section, Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
37
|
Recombinogenic telomeres in diploid Sorex granarius (Soricidae, Eulipotyphla) fibroblast cells. Mol Cell Biol 2014; 34:2786-99. [PMID: 24842907 DOI: 10.1128/mcb.01697-13] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The telomere structure in the Iberian shrew Sorex granarius is characterized by unique, striking features, with short arms of acrocentric chromosomes carrying extremely long telomeres (up to 300 kb) with interspersed ribosomal DNA (rDNA) repeat blocks. In this work, we investigated the telomere physiology of S. granarius fibroblast cells and found that telomere repeats are transcribed on both strands and that there is no telomere-dependent senescence mechanism. Although telomerase activity is detectable throughout cell culture and appears to act on both short and long telomeres, we also discovered that signatures of a recombinogenic activity are omnipresent, including telomere-sister chromatid exchanges, formation of alternative lengthening of telomeres (ALT)-associated PML-like bodies, production of telomere circles, and a high frequency of telomeres carrying marks of a DNA damage response. Our results suggest that recombination participates in the maintenance of the very long telomeres in normal S. granarius fibroblasts. We discuss the possible interplay between the interspersed telomere and rDNA repeats in the stabilization of the very long telomeres in this organism.
Collapse
|
38
|
Borštnik B, Pumpernik D. The apparent enhancement of CpG transversions in primate lineage is a consequence of multiple replacements. J Bioinform Comput Biol 2014; 12:1450011. [PMID: 24969749 DOI: 10.1142/s0219720014500115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We claim that the apparently enhanced CpG transversions in the form CpG to CpC/GpG or to ApG/CpT are caused by the hypermutable CpG to CpA/TpG transition. The nucleotide replacement counts obtained from the human/chimpanzee/gorilla/orangutan sequence alignments representing the replacements due to the evolutionary species divergence and the results of 1000 genomes project that provide us with the differences due to the intraspecies diversification were analyzed to estimate the ratio of CpG versus non-CpG transversion probabilities. The trinucleotide replacement counts were extracted from the regions that are free of functional constraints. The CpG transversion probabilities based upon the genomic comparisons were found to exceed more than twice the non-CpG transversions. The diversity data emerging from 14 population groups were partitioned in five classes as a function of the parameter quantifying the spread of the polymorphic allele among the group of individuals. The results based upon the human polymorphism exhibit a trend where CpG over non-CpG transversion probability ratio is less and less exceeding unity as the values of the derived allele frequency (DAF) of snps are diminishing. A computer simulation of a simplified model indicates that the phenomenon of the apparent enhancement of CpG transversions can have its source in the interference of the entropic effects with the maximum likelihood methodologies.
Collapse
Affiliation(s)
- Branko Borštnik
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | |
Collapse
|
39
|
Gong Q, Tao Y, Yang JR, Cai J, Yuan Y, Ruan J, Yang J, Liu H, Li W, Lu X, Zhuang SM, Wang SM, Wu CI. Identification of medium-sized genomic deletions with low coverage, mate-paired restricted tags. BMC Genomics 2013; 14:51. [PMID: 23347462 PMCID: PMC3608957 DOI: 10.1186/1471-2164-14-51] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Accepted: 01/18/2013] [Indexed: 11/30/2022] Open
Abstract
Background Genomic deletions are known to be widespread in many species. Variant sequencing-based approaches for identifying deletions have been developed, but their powers to detect those deletions that affect medium-sized regions are limited when the sequencing coverage is low. Results We present a cost-effective method for identifying medium-sized deletions in genomic regions with low genomic coverage. Two mate-paired libraries were separately constructed from human cancerous tissue to generate paired short reads (ditags) from restriction fragments digested with a 4-base restriction enzyme. A total of 3 Gb of paired reads (1.0× genome size) was collected, and 175 deletions were inferred by identifying the ditags with disorder alignments to the reference genome sequence. Sanger sequencing results confirmed an overall detection accuracy of 95%. Good reproducibility was verified by the deletions that were detected by both libraries. Conclusions We provide an approach to accurately identify medium-sized deletions in large genomes with low sequence coverage. It can be applied in studies of comparative genomics and in the identification of germline and somatic variants.
Collapse
|
40
|
McLain AT, Carman GW, Fullerton ML, Beckstrom TO, Gensler W, Meyer TJ, Faulk C, Batzer MA. Analysis of western lowland gorilla (Gorilla gorilla gorilla) specific Alu repeats. Mob DNA 2013; 4:26. [PMID: 24262036 PMCID: PMC4177385 DOI: 10.1186/1759-8753-4-26] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 10/23/2013] [Indexed: 02/07/2023] Open
Abstract
Background Research into great ape genomes has revealed widely divergent activity levels over time for Alu elements. However, the diversity of this mobile element family in the genome of the western lowland gorilla has previously been uncharacterized. Alu elements are primate-specific short interspersed elements that have been used as phylogenetic and population genetic markers for more than two decades. Alu elements are present at high copy number in the genomes of all primates surveyed thus far. The AluY subfamily and its derivatives have been recognized as the evolutionarily youngest Alu subfamily in the Old World primate lineage. Results Here we use a combination of computational and wet-bench laboratory methods to assess and catalog AluY subfamily activity level and composition in the western lowland gorilla genome (gorGor3.1). A total of 1,075 independent AluY insertions were identified and computationally divided into 10 subfamilies, with the largest number of gorilla-specific elements assigned to the canonical AluY subfamily. Conclusions The retrotransposition activity level appears to be significantly lower than that seen in the human and chimpanzee lineages, while higher than that seen in orangutan genomes, indicative of differential Alu amplification in the western lowland gorilla lineage as compared to other Homininae.
Collapse
Affiliation(s)
- Adam T McLain
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Giannuzzi G, Pazienza M, Huddleston J, Antonacci F, Malig M, Vives L, Eichler EE, Ventura M. Hominoid fission of chromosome 14/15 and the role of segmental duplications. Genome Res 2013; 23:1763-73. [PMID: 24077392 PMCID: PMC3814877 DOI: 10.1101/gr.156240.113] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Ape chromosomes homologous to human chromosomes 14 and 15 were generated by a fission event of an ancestral submetacentric chromosome, where the two chromosomes were joined head-to-tail. The hominoid ancestral chromosome most closely resembles the macaque chromosome 7. In this work, we provide insights into the evolution of human chromosomes 14 and 15, performing a comparative study between macaque boundary region 14/15 and the orthologous human regions. We construct a 1.6-Mb contig of macaque BAC clones in the region orthologous to the ancestral hominoid fission site and use it to define the structural changes that occurred on human 14q pericentromeric and 15q subtelomeric regions. We characterize the novel euchromatin–heterochromatin transition region (∼20 Mb) acquired during the neocentromere establishment on chromosome 14, and find it was mainly derived through pericentromeric duplications from ancestral hominoid chromosomes homologous to human 2q14–qter and 10. Further, we show a relationship between evolutionary hotspots and low-copy repeat loci for chromosome 15, revealing a possible role of segmental duplications not only in mediating but also in “stitching” together rearrangement breakpoints.
Collapse
Affiliation(s)
- Giuliana Giannuzzi
- Dipartimento di Biologia, Università degli Studi di Bari "Aldo Moro," Bari 70125, Italy
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
We analyzed 83 fully sequenced great ape genomes for mobile element insertions, predicting a total of 49,452 fixed and polymorphic Alu and long interspersed element 1 (L1) insertions not present in the human reference assembly and assigning each retrotransposition event to a different time point during great ape evolution. We used these homoplasy-free markers to construct a mobile element insertions-based phylogeny of humans and great apes and demonstrate their differential power to discern ape subspecies and populations. Within this context, we find a good correlation between L1 diversity and single-nucleotide polymorphism heterozygosity (r(2) = 0.65) in contrast to Alu repeats, which show little correlation (r(2) = 0.07). We estimate that the "rate" of Alu retrotransposition has differed by a factor of 15-fold in these lineages. Humans, chimpanzees, and bonobos show the highest rates of Alu accumulation--the latter two since divergence 1.5 Mya. The L1 insertion rate, in contrast, has remained relatively constant, with rates differing by less than a factor of three. We conclude that Alu retrotransposition has been the most variable form of genetic variation during recent human-great ape evolution, with increases and decreases occurring over very short periods of evolutionary time.
Collapse
|
43
|
Abstract
Copy number variation (CNV) contributes to disease and has restructured the genomes of great apes. The diversity and rate of this process, however, have not been extensively explored among great ape lineages. We analyzed 97 deeply sequenced great ape and human genomes and estimate 16% (469 Mb) of the hominid genome has been affected by recent CNV. We identify a comprehensive set of fixed gene deletions (n = 340) and duplications (n = 405) as well as >13.5 Mb of sequence that has been specifically lost on the human lineage. We compared the diversity and rates of copy number and single nucleotide variation across the hominid phylogeny. We find that CNV diversity partially correlates with single nucleotide diversity (r2 = 0.5) and recapitulates the phylogeny of apes with few exceptions. Duplications significantly outpace deletions (2.8-fold). The load of segregating duplications remains significantly higher in bonobos, Western chimpanzees, and Sumatran orangutans—populations that have experienced recent genetic bottlenecks (P = 0.0014, 0.02, and 0.0088, respectively). The rate of fixed deletion has been more clocklike with the exception of the chimpanzee lineage, where we observe a twofold increase in the chimpanzee–bonobo ancestor (P = 4.79 × 10−9) and increased deletion load among Western chimpanzees (P = 0.002). The latter includes the first genomic disorder in a chimpanzee with features resembling Smith-Magenis syndrome mediated by a chimpanzee-specific increase in segmental duplication complexity. We hypothesize that demographic effects, such as bottlenecks, have contributed to larger and more gene-rich segments being deleted in the chimpanzee lineage and that this effect, more generally, may account for episodic bursts in CNV during hominid evolution.
Collapse
|
44
|
Prado-Martinez J, Hernando-Herraez I, Lorente-Galdos B, Dabad M, Ramirez O, Baeza-Delgado C, Morcillo-Suarez C, Alkan C, Hormozdiari F, Raineri E, Estellé J, Fernandez-Callejo M, Valles M, Ritscher L, Schöneberg T, de la Calle-Mustienes E, Casillas S, Rubio-Acero R, Melé M, Engelken J, Caceres M, Gomez-Skarmeta JL, Gut M, Bertranpetit J, Gut IG, Abello T, Eichler EE, Mingarro I, Lalueza-Fox C, Navarro A, Marques-Bonet T. The genome sequencing of an albino Western lowland gorilla reveals inbreeding in the wild. BMC Genomics 2013; 14:363. [PMID: 23721540 PMCID: PMC3673836 DOI: 10.1186/1471-2164-14-363] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 05/23/2013] [Indexed: 11/28/2022] Open
Abstract
Background The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. Results We successfully identified the causal genetic variant for Snowflake’s albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake’s parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. Conclusions In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.
Collapse
Affiliation(s)
- Javier Prado-Martinez
- Institut de Biologia Evolutiva, (CSIC-Universitat Pompeu Fabra), PRBB, Barcelona 08003, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Novo C, Arnoult N, Bordes WY, Castro-Vega L, Gibaud A, Dutrillaux B, Bacchetti S, Londoño-Vallejo A. The heterochromatic chromosome caps in great apes impact telomere metabolism. Nucleic Acids Res 2013; 41:4792-801. [PMID: 23519615 PMCID: PMC3643582 DOI: 10.1093/nar/gkt169] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
In contrast with the limited sequence divergence accumulated after separation of higher primate lineages, marked cytogenetic variation has been associated with the genome evolution in these species. Studying the impact of such structural variations on defined molecular processes can provide valuable insights on how genome structural organization contributes to organismal evolution. Here, we show that telomeres on chromosome arms carrying subtelomeric heterochromatic caps in the chimpanzee, which are completely absent in humans, replicate later than telomeres on chromosome arms without caps. In gorilla, on the other hand, a proportion of the subtelomeric heterochromatic caps present in most chromosome arms are associated with large blocks of telomere-like sequences that follow a replication program different from that of bona fide telomeres. Strikingly, telomere-containing RNA accumulates extrachromosomally in gorilla mitotic cells, suggesting that at least some aspects of telomere-containing RNA biogenesis have diverged in gorilla, perhaps in concert with the evolution of heterochromatic caps in this species.
Collapse
Affiliation(s)
- Clara Novo
- Telomeres and Cancer laboratory, 'Equipe Labellisée Ligue contre le Cancer', UMR3244, Institut Curie, 26 rue d'Ulm, 75248 Paris, France
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Moser AB, Hey J, Dranchak PK, Karaman MW, Zhao J, Cox LA, Ryder OA, Hacia JG. Diverse captive non-human primates with phytanic acid-deficient diets rich in plant products have substantial phytanic acid levels in their red blood cells. Lipids Health Dis 2013; 12:10. [PMID: 23379307 PMCID: PMC3571895 DOI: 10.1186/1476-511x-12-10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 01/31/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Humans and rodents with impaired phytanic acid (PA) metabolism can accumulate toxic stores of PA that have deleterious effects on multiple organ systems. Ruminants and certain fish obtain PA from the microbial degradation of dietary chlorophyll and/or through chlorophyll-derived precursors. In contrast, humans cannot derive PA from chlorophyll and instead normally obtain it only from meat, dairy, and fish products. RESULTS Captive apes and Old world monkeys had significantly higher red blood cell (RBC) PA levels relative to humans when all subjects were fed PA-deficient diets. Given the adverse health effects resulting from PA over accumulation, we investigated the molecular evolution of thirteen PA metabolism genes in apes, Old world monkeys, and New world monkeys. All non-human primate (NHP) orthologs are predicted to encode full-length proteins with the marmoset Phyh gene containing a rare, but functional, GA splice donor dinucleotide. Acox2, Scp2, and Pecr sequences had amino acid positions with accelerated substitution rates while Amacr had significant variation in evolutionary rates in apes relative to other primates. CONCLUSIONS Unlike humans, diverse captive NHPs with PA-deficient diets rich in plant products have substantial RBC PA levels. The favored hypothesis is that NHPs can derive significant amounts of PA from the degradation of ingested chlorophyll through gut fermentation. If correct, this raises the possibility that RBC PA levels could serve as a biomarker for evaluating the digestive health of captive NHPs. Furthermore, the evolutionary rates of the several genes relevant to PA metabolism provide candidate genetic adaptations to NHP diets.
Collapse
Affiliation(s)
- Ann B Moser
- Department of Biochemistry and Molecular Biology, Broad Center for Regenerative Medicine and Stem Cell Research, University of Southern California, Los Angeles, CA 90089, USA
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Lorente-Galdos B, Bleyhl J, Santpere G, Vives L, Ramírez O, Hernandez J, Anglada R, Cooper GM, Navarro A, Eichler EE, Marques-Bonet T. Accelerated exon evolution within primate segmental duplications. Genome Biol 2013; 14:R9. [PMID: 23360670 PMCID: PMC3906575 DOI: 10.1186/gb-2013-14-1-r9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Revised: 12/20/2012] [Accepted: 01/29/2013] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND The identification of signatures of natural selection has long been used as an approach to understanding the unique features of any given species. Genes within segmental duplications are overlooked in most studies of selection due to the limitations of draft nonhuman genome assemblies and to the methodological reliance on accurate gene trees, which are difficult to obtain for duplicated genes. RESULTS In this work, we detected exons with an accumulation of high-quality nucleotide differences between the human assembly and shotgun sequencing reads from single human and macaque individuals. Comparing the observed rates of nucleotide differences between coding exons and their flanking intronic sequences with a likelihood-ratio test, we identified 74 exons with evidence for rapid coding sequence evolution during the evolution of humans and Old World monkeys. Fifty-five percent of rapidly evolving exons were either partially or totally duplicated, which is a significant enrichment of the 6% rate observed across all human coding exons. CONCLUSIONS Our results provide a more comprehensive view of the action of selection upon segmental duplications, which are the most complex regions of our genomes. In light of these findings, we suggest that segmental duplications could be subjected to rapid evolution more frequently than previously thought.
Collapse
Affiliation(s)
- Belen Lorente-Galdos
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
- National Institute for Bioinformatics (INB), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Jonathan Bleyhl
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Gabriel Santpere
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Laura Vives
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Oscar Ramírez
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Jessica Hernandez
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Roger Anglada
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Gregory M Cooper
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Arcadi Navarro
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
- National Institute for Bioinformatics (INB), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
- Institucio Catalana de Recerca i Estudis Avançats (ICREA), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, Seattle, Washington 98195, USA
| | - Tomas Marques-Bonet
- IBE, Institute of Evolutionary Biology (Universitat Pompeu Fabra-CSIC), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
- Institucio Catalana de Recerca i Estudis Avançats (ICREA), PRBB, Doctor Aiguader, 88, 08003, Barcelona, Catalonia, Spain
| |
Collapse
|
48
|
Xin H, Lee D, Hormozdiari F, Yedkar S, Mutlu O, Alkan C. Accelerating read mapping with FastHASH. BMC Genomics 2013; 14 Suppl 1:S13. [PMID: 23369189 PMCID: PMC3549798 DOI: 10.1186/1471-2164-14-s1-s13] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS.We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection.We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.
Collapse
Affiliation(s)
- Hongyi Xin
- Depts. of Computer Science and Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | |
Collapse
|
49
|
Abstract
The genomics era has opened up exciting possibilities in the field of conservation biology by enabling genomic analyses of threatened species that previously were limited to model organisms. Next-generation sequencing (NGS) and the collection of genome-wide data allow for more robust studies of the demographic history of populations and adaptive variation associated with fitness and local adaptation. Genomic analyses can also advance management efforts for threatened wild and captive populations by identifying loci contributing to inbreeding depression and disease susceptibility, and predicting fitness consequences of introgression. However, the development of genomic tools in wild species still carries multiple challenges, particularly those associated with computational and sampling constraints. This review provides an overview of the most significant applications of NGS and the implications and limitations of genomic studies in conservation.
Collapse
Affiliation(s)
- Cynthia C Steiner
- Institute for Conservation Research, San Diego Zoo Global, Escondido, California 92027; ; ;
| | | | | | | |
Collapse
|
50
|
Schneider VA, Chen HC, Clausen C, Meric PA, Zhou Z, Bouk N, Husain N, Maglott DR, Church DM. Clone DB: an integrated NCBI resource for clone-associated data. Nucleic Acids Res 2012. [PMID: 23193260 PMCID: PMC3531087 DOI: 10.1093/nar/gks1164] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The National Center for Biotechnology Information (NCBI) Clone DB (http://www.ncbi.nlm.nih.gov/clone/) is an integrated resource providing information about and facilitating access to clones, which serve as valuable research reagents in many fields, including genome sequencing and variation analysis. Clone DB represents an expansion and replacement of the former NCBI Clone Registry and has records for genomic and cell-based libraries and clones representing more than 100 different eukaryotic taxa. Records provide details of library construction, associated sequences, map positions and information about resource distribution. Clone DB is indexed in the NCBI Entrez system and can be queried by fields that include organism, clone name, gene name and sequence identifier. Whenever possible, genomic clones are mapped to reference assemblies and their map positions provided in clone records. Clones mapping to specific genomic regions can also be searched for using the NCBI Clone Finder tool, which accepts queries based on sequence coordinates or features such as gene or transcript names. Clone DB makes reports of library, clone and placement data on its FTP site available for download. With Clone DB, users now have available to them a centralized resource that provides them with the tools they will need to make use of these important research reagents.
Collapse
Affiliation(s)
- Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|