1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Caduff M, Eckel R, Leuenberger C, Wegmann D. Accurate Bayesian inference of sex chromosome karyotypes and sex-linked scaffolds from low-depth sequencing data. Mol Ecol Resour 2024; 24:e13913. [PMID: 38173222 DOI: 10.1111/1755-0998.13913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 11/27/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024]
Abstract
The identification of sex-linked scaffolds and the genetic sex of individuals, i.e. their sex karyotype, is a fundamental step in population genomic studies. If sex-linked scaffolds are known, single individuals may be sexed based on read counts of next-generation sequencing data. If both sex-linked scaffolds as well as sex karyotypes are unknown, as is often the case for non-model organisms, they have to be jointly inferred. For both cases, current methods rely on arbitrary thresholds, which limits their power for low-depth data. In addition, most current methods are limited to euploid sex karyotypes (XX and XY). Here we develop BeXY, a fully Bayesian method to jointly infer the posterior probabilities for each scaffold to be autosomal, X- or Y-linked and for each individual to be any of the sex karyotypes XX, XY, X0, XXX, XXY, XYY and XXYY. If the sex-linked scaffolds are known, it also identifies autosomal trisomies and estimates the sex karyotype posterior probabilities for single individuals. As we show with downsampling experiments, BeXY has higher power than all existing methods. It accurately infers the sex karyotype of ancient human samples with as few as 20,000 reads and accurately infers sex-linked scaffolds from data sets of just a handful of samples or with highly imbalanced sex ratios, also in the case of low-quality reference assemblies. We illustrate the power of BeXY by applying it to both whole-genome shotgun and target enrichment sequencing data of ancient and modern humans, as well as several non-model organisms.
Collapse
Affiliation(s)
- Madleina Caduff
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Raphael Eckel
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Christoph Leuenberger
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Daniel Wegmann
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Fribourg, Switzerland
| |
Collapse
|
3
|
Wang XB, Lu HW, Liu QY, Li AL, Zhou HL, Zhang Y, Zhu TQ, Ruan J. An effective strategy for assembling the sex-limited chromosome. Gigascience 2024; 13:giae015. [PMID: 38626722 PMCID: PMC11020242 DOI: 10.1093/gigascience/giae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/17/2024] [Accepted: 03/15/2024] [Indexed: 04/18/2024] Open
Abstract
BACKGROUND Most currently available reference genomes lack the sequence map of sex-limited (such as Y and W) chromosomes, which results in incomplete assemblies that hinder further research on sex chromosomes. Recent advancements in long-read sequencing and population sequencing have provided the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. FINDINGS We introduce the first computational method, Sorting long Reads of Y or other sex-limited chromosome (SRY), which achieves improved assembly results compared to flow sorting. Specifically, SRY outperforms in the heterochromatic region and demonstrates comparable performance in other regions. Furthermore, SRY enhances the capabilities of the hybrid assembly software, resulting in improved continuity and accuracy. CONCLUSIONS Our method enables true complete genome assembly and facilitates downstream research of sex-limited chromosomes.
Collapse
Affiliation(s)
- Xiao-Bo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
- The Shennong Laboratory/Institute of Crop Molecular Breeding, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Hong-Wei Lu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Qing-You Liu
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - A-Lun Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Hong-Ling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Yong Zhang
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Tian-Qi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| |
Collapse
|
4
|
Possible stochastic sex determination in Bursaphelenchus nematodes. Nat Commun 2022; 13:2574. [PMID: 35546147 PMCID: PMC9095866 DOI: 10.1038/s41467-022-30173-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 04/19/2022] [Indexed: 12/13/2022] Open
Abstract
Sex determination mechanisms evolve surprisingly rapidly, yet little is known in the large nematode phylum other than for Caenorhabditis elegans, which relies on chromosomal XX-XO sex determination and a dosage compensation mechanism. Here we analyze by sex-specific genome sequencing and genetic analysis sex determination in two fungal feeding/plant-parasitic Bursaphelenchus nematodes and find that their sex differentiation is more likely triggered by random, epigenetic regulation than by more well-known mechanisms of chromosomal or environmental sex determination. There is no detectable difference in male and female chromosomes, nor any linkage to sexual phenotype. Moreover, the protein sets of these nematodes lack genes involved in X chromosome dosage counting or compensation. By contrast, our genetic screen for sex differentiation mutants identifies a Bursaphelenchus ortholog of tra-1, the major output of the C. elegans sex determination cascade. Nematode sex determination pathways might have evolved by “bottom-up” accretion from the most downstream regulator, tra-1. In most species, sex is determined by genetic or environmental factors. Here, the authors present evidence that sex determination in Bursaphelenchus nematodes is instead likely to be regulated by a random, epigenetic mechanism.
Collapse
|
5
|
Sigeman H, Sinclair B, Hansson B. Findzx: an automated pipeline for detecting and visualising sex chromosomes using whole-genome sequencing data. BMC Genomics 2022; 23:328. [PMID: 35477344 PMCID: PMC9044604 DOI: 10.1186/s12864-022-08432-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 03/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sex chromosomes have evolved numerous times, as revealed by recent genomic studies. However, large gaps in our knowledge of sex chromosome diversity across the tree of life remain. Filling these gaps, through the study of novel species, is crucial for improved understanding of why and how sex chromosomes evolve. Characterization of sex chromosomes in already well-studied organisms is also important to avoid misinterpretations of population genomic patterns caused by undetected sex chromosome variation. RESULTS Here we present findZX, an automated Snakemake-based computational pipeline for detecting and visualizing sex chromosomes through differences in genome coverage and heterozygosity between any number of males and females. A main feature of the pipeline is the option to perform a genome coordinate liftover to a reference genome of another species. This allows users to inspect sex-linked regions over larger contiguous chromosome regions, while also providing important between-species synteny information. To demonstrate its effectiveness, we applied findZX to publicly available genomic data from species belonging to widely different taxonomic groups (mammals, birds, reptiles, and fish), with sex chromosome systems of different ages, sizes, and levels of differentiation. We also demonstrate that the liftover method is robust over large phylogenetic distances (> 80 million years of evolution). CONCLUSIONS With findZX we provide an easy-to-use and highly effective tool for identification of sex chromosomes. The pipeline is compatible with both Linux and MacOS systems, and scalable to suit different computational platforms.
Collapse
Affiliation(s)
- Hanna Sigeman
- Department of Biology, Lund University, Ecology Building, 223 62, Lund, Sweden.
| | - Bella Sinclair
- Department of Biology, Lund University, Ecology Building, 223 62, Lund, Sweden
| | - Bengt Hansson
- Department of Biology, Lund University, Ecology Building, 223 62, Lund, Sweden
| |
Collapse
|
6
|
Hansen CCR, Westfall KM, Pálsson S. Evaluation of four methods to identify the homozygotic sex chromosome in small populations. BMC Genomics 2022; 23:160. [PMID: 35209843 PMCID: PMC8867824 DOI: 10.1186/s12864-022-08393-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whole genomes are commonly assembled into a collection of scaffolds and often lack annotations of autosomes, sex chromosomes, and organelle genomes (i.e., mitochondrial and chloroplast). As these chromosome types differ in effective population size and can have highly disparate evolutionary histories, it is imperative to take this information into account when analysing genomic variation. Here we assessed the accuracy of four methods for identifying the homogametic sex chromosome in a small population using two whole genome sequences (WGS) and 133 RAD sequences of white-tailed eagles (Haliaeetus albicilla): i) difference in read depth per scaffold in a male and a female, ii) heterozygosity per scaffold in a male and a female, iii) mapping to the reference genome of a related species (chicken) with annotated sex chromosomes, and iv) analysis of SNP-loadings from a principal components analysis (PCA), based on the low-depth RADseq data. RESULTS The best performing approach was the reference mapping (method iii), which identified 98.12% of the expected homogametic sex chromosome (Z). Read depth per scaffold (method i) identified 86.41% of the homogametic sex chromosome with few false positives. SNP-loading scores (method iv) identified 78.6% of the Z-chromosome and had a false positive discovery rate of more than 10%. Heterozygosity per scaffold (method ii) did not provide clear results due to a lack of diversity in both the Z and autosomal chromosomes, and potential interference from the heterogametic sex chromosome (W). The evaluation of these methods also revealed 10 Mb of putative PAR and gametologous regions. CONCLUSION Identification of the homogametic sex chromosome in a small population is best accomplished by reference mapping or examining differences in read depth between sexes.
Collapse
Affiliation(s)
| | - Kristen M Westfall
- Department of Life and Environmental Sciences, University of Iceland, Reykjavik, Iceland.,Current: Fisheries and Oceans Canada, Pacific Biological Station, Nanaimo, BC, Canada
| | - Snæbjörn Pálsson
- Department of Life and Environmental Sciences, University of Iceland, Reykjavik, Iceland
| |
Collapse
|
7
|
Sahlin K. Effective sequence similarity detection with strobemers. Genome Res 2021; 31:2080-2094. [PMID: 34667119 PMCID: PMC8559714 DOI: 10.1101/gr.275648.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 08/20/2021] [Indexed: 01/08/2023]
Abstract
k-mer-based methods are widely used in bioinformatics for various types of sequence comparisons. However, a single mutation will mutate k consecutive k-mers and make most k-mer-based applications for sequence comparison sensitive to variable mutation rates. Many techniques have been studied to overcome this sensitivity, for example, spaced k-mers and k-mer permutation techniques, but these techniques do not handle indels well. For indels, pairs or groups of small k-mers are commonly used, but these methods first produce k-mer matches, and only in a second step, a pairing or grouping of k-mers is performed. Such techniques produce many redundant k-mer matches owing to the size of k Here, we propose strobemers as an alternative to k-mers for sequence comparison. Intuitively, strobemers consist of two or more linked shorter k-mers, where the combination of linked k-mers is decided by a hash function. We use simulated data to show that strobemers provide more evenly distributed sequence matches and are less sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers also produce higher match coverage across sequences. We further implement a proof-of-concept sequence-matching tool StrobeMap and use synthetic and biological Oxford Nanopore sequencing data to show the utility of using strobemers for sequence comparison in different contexts such as sequence clustering and alignment scenarios.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 10691 Stockholm, Sweden
| |
Collapse
|
8
|
Elkrewi M, Moldovan MA, Picard MAL, Vicoso B. Schistosome W-linked genes inform temporal dynamics of sex chromosome evolution and suggest candidate for sex determination. Mol Biol Evol 2021; 38:5345-5358. [PMID: 34146097 PMCID: PMC8662593 DOI: 10.1093/molbev/msab178] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Schistosomes, the human parasites responsible for snail fever, are female-heterogametic. Different parts of their ZW sex chromosomes have stopped recombining in distinct lineages, creating "evolutionary strata" of various ages. While the Z-chromosome is well characterized at the genomic and molecular level, the W-chromosome has remained largely unstudied from an evolutionary perspective, as only a few W-linked genes have been detected outside of the model species Schistosoma mansoni. Here, we characterize the gene content and evolution of the W-chromosomes of S. mansoni and of the divergent species S. japonicum. We use a combined RNA/DNA k-mer based pipeline to assemble around one hundred candidate W-specific transcripts in each of the species. About half of them map to known protein coding genes, the majority homologous to S. mansoni Z-linked genes. We perform an extended analysis of the evolutionary strata present in the two species (including characterizing a previously undetected young stratum in S. japonicum) to infer patterns of sequence and expression evolution of W-linked genes at different time points after recombination was lost. W-linked genes show evidence of degeneration, including high rates of protein evolution and reduced expression. Most are found in young lineage-specific strata, with only a few high expression ancestral W-genes remaining, consistent with the progressive erosion of non-recombining regions. Among these, the splicing factor U2AF2 stands out as a promising candidate for primary sex determination, opening new avenues for understanding the molecular basis of the reproductive biology of this group.
Collapse
Affiliation(s)
- Marwan Elkrewi
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria
| | - Mikhail A Moldovan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria.,Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Marion A L Picard
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria.,Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins (BIOM), Observatoire Océanologique, Banyuls-sur-Mer, France
| | - Beatriz Vicoso
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria
| |
Collapse
|
9
|
Abstract
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into the more compact set of unitigs. We generalize this approach and formulate it as the problem of finding a smallest spectrum-preserving string set (SPSS) representation. We show that this problem is equivalent to finding a smallest path cover in a compacted de Bruijn graph. Using this reduction, we prove a lower bound on the size of the optimal SPSS and propose a greedy method called UST (Unitig-STitch) that results in a smaller representation than unitigs and is nearly optimal with respect to our lower bound. We demonstrate the usefulness of the SPSS formulation with two applications of UST. The first one is a compression algorithm, UST-Compress, which, we show, can store a set of k-mers by using an order-of-magnitude less disk space than other lossless compression tools. The second one is an exact static k-mer membership index, UST-FM, which, we show, improves index size by 10%-44% compared with other state-of-the-art low-memory indices.
Collapse
Affiliation(s)
- Amatur Rahman
- Department of Computer Science and Engineering, Penn State, University Park, State College, PA, USA
| | - Paul Medevedev
- Department of Computer Science and Engineering, Penn State, University Park, State College, PA, USA
- Department of Biochemistry and Molecular Biology, Penn State, University Park, State College, PA, USA
- Center for Computational Biology and Bioinformatics, Penn State, University Park, State College, PA, USA
| |
Collapse
|
10
|
Player RA, Forsyth ER, Verratti KJ, Mohr DW, Scott AF, Bradburne CE. A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS. Life Sci Alliance 2021; 4:4/4/e202000902. [PMID: 33514656 PMCID: PMC7898556 DOI: 10.26508/lsa.202000902] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 01/07/2021] [Accepted: 01/13/2021] [Indexed: 11/24/2022] Open
Abstract
Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.
Collapse
Affiliation(s)
- Robert A Player
- Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
| | - Ellen R Forsyth
- Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
| | - Kathleen J Verratti
- Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
| | - David W Mohr
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Alan F Scott
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Christopher E Bradburne
- Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA .,McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| |
Collapse
|
11
|
Abstract
The male-specific Y chromosome harbors genes important for sperm production. Because Y is repetitive, its DNA sequence was deciphered for only a few species, and its evolution remains elusive. Here we compared the Y chromosomes of great apes (human, chimpanzee, bonobo, gorilla, and orangutan) and found that many of their repetitive sequences and multicopy genes were likely already present in their common ancestor. Y repeats had increased intrachromosomal contacts, which might facilitate preservation of genes and gene regulatory elements. Chimpanzee and bonobo, experiencing high sperm competition, underwent many DNA changes and gene losses on the Y. Our research is significant for understanding the role of the Y chromosome in reproduction of nonhuman great apes, all of which are endangered. The mammalian male-specific Y chromosome plays a critical role in sex determination and male fertility. However, because of its repetitive and haploid nature, it is frequently absent from genome assemblies and remains enigmatic. The Y chromosomes of great apes represent a particular puzzle: their gene content is more similar between human and gorilla than between human and chimpanzee, even though human and chimpanzee share a more recent common ancestor. To solve this puzzle, here we constructed a dataset including Ys from all extant great ape genera. We generated assemblies of bonobo and orangutan Ys from short and long sequencing reads and aligned them with the publicly available human, chimpanzee, and gorilla Y assemblies. Analyzing this dataset, we found that the genus Pan, which includes chimpanzee and bonobo, experienced accelerated substitution rates. Pan also exhibited elevated gene death rates. These observations are consistent with high levels of sperm competition in Pan. Furthermore, we inferred that the great ape common ancestor already possessed multicopy sequences homologous to most human and chimpanzee palindromes. Nonetheless, each species also acquired distinct ampliconic sequences. We also detected increased chromatin contacts between and within palindromes (from Hi-C data), likely facilitating gene conversion and structural rearrangements. Our results highlight the dynamic mode of Y chromosome evolution and open avenues for studies of male-specific dispersal in endangered great ape species.
Collapse
|