1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Wang XB, Lu HW, Liu QY, Li AL, Zhou HL, Zhang Y, Zhu TQ, Ruan J. An effective strategy for assembling the sex-limited chromosome. Gigascience 2024; 13:giae015. [PMID: 38626722 PMCID: PMC11020242 DOI: 10.1093/gigascience/giae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/17/2024] [Accepted: 03/15/2024] [Indexed: 04/18/2024] Open
Abstract
BACKGROUND Most currently available reference genomes lack the sequence map of sex-limited (such as Y and W) chromosomes, which results in incomplete assemblies that hinder further research on sex chromosomes. Recent advancements in long-read sequencing and population sequencing have provided the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. FINDINGS We introduce the first computational method, Sorting long Reads of Y or other sex-limited chromosome (SRY), which achieves improved assembly results compared to flow sorting. Specifically, SRY outperforms in the heterochromatic region and demonstrates comparable performance in other regions. Furthermore, SRY enhances the capabilities of the hybrid assembly software, resulting in improved continuity and accuracy. CONCLUSIONS Our method enables true complete genome assembly and facilitates downstream research of sex-limited chromosomes.
Collapse
Affiliation(s)
- Xiao-Bo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
- The Shennong Laboratory/Institute of Crop Molecular Breeding, Henan Academy of Agricultural Sciences, Zhengzhou 450002, China
| | - Hong-Wei Lu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Qing-You Liu
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - A-Lun Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Hong-Ling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Yong Zhang
- Key Laboratory of Zoological Systematics and Evolution & State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Tian-Qi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| |
Collapse
|
3
|
Cechova M, Miga KH. Satellite DNAs and human sex chromosome variation. Semin Cell Dev Biol 2022; 128:15-25. [PMID: 35644878 DOI: 10.1016/j.semcdb.2022.04.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/26/2022] [Accepted: 04/27/2022] [Indexed: 11/17/2022]
Abstract
Satellite DNAs are present on every chromosome in the cell and are typically enriched in repetitive, heterochromatic parts of the human genome. Sex chromosomes represent a unique genomic and epigenetic context. In this review, we first report what is known about satellite DNA biology on human X and Y chromosomes, including repeat content and organization, as well as satellite variation in typical euploid individuals. Then, we review sex chromosome aneuploidies that are among the most common types of aneuploidies in the general population, and are better tolerated than autosomal aneuploidies. This is demonstrated also by the fact that aging is associated with the loss of the X, and especially the Y chromosome. In addition, supernumerary sex chromosomes enable us to study general processes in a cell, such as analyzing heterochromatin dosage (i.e. additional Barr bodies and long heterochromatin arrays on Yq) and their downstream consequences. Finally, genomic and epigenetic organization and regulation of satellite DNA could influence chromosome stability and lead to aneuploidy. In this review, we argue that the complete annotation of satellite DNA on sex chromosomes in human, and especially in centromeric regions, will aid in explaining the prevalence and the consequences of sex chromosome aneuploidies.
Collapse
Affiliation(s)
- Monika Cechova
- Faculty of Informatics, Masaryk University, Czech Republic
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA; UC Santa Cruz Genomics Institute, University of California Santa Cruz, CA 95064, USA
| |
Collapse
|
4
|
Elkrewi M, Moldovan MA, Picard MAL, Vicoso B. Schistosome W-linked genes inform temporal dynamics of sex chromosome evolution and suggest candidate for sex determination. Mol Biol Evol 2021; 38:5345-5358. [PMID: 34146097 PMCID: PMC8662593 DOI: 10.1093/molbev/msab178] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Schistosomes, the human parasites responsible for snail fever, are female-heterogametic. Different parts of their ZW sex chromosomes have stopped recombining in distinct lineages, creating "evolutionary strata" of various ages. While the Z-chromosome is well characterized at the genomic and molecular level, the W-chromosome has remained largely unstudied from an evolutionary perspective, as only a few W-linked genes have been detected outside of the model species Schistosoma mansoni. Here, we characterize the gene content and evolution of the W-chromosomes of S. mansoni and of the divergent species S. japonicum. We use a combined RNA/DNA k-mer based pipeline to assemble around one hundred candidate W-specific transcripts in each of the species. About half of them map to known protein coding genes, the majority homologous to S. mansoni Z-linked genes. We perform an extended analysis of the evolutionary strata present in the two species (including characterizing a previously undetected young stratum in S. japonicum) to infer patterns of sequence and expression evolution of W-linked genes at different time points after recombination was lost. W-linked genes show evidence of degeneration, including high rates of protein evolution and reduced expression. Most are found in young lineage-specific strata, with only a few high expression ancestral W-genes remaining, consistent with the progressive erosion of non-recombining regions. Among these, the splicing factor U2AF2 stands out as a promising candidate for primary sex determination, opening new avenues for understanding the molecular basis of the reproductive biology of this group.
Collapse
Affiliation(s)
- Marwan Elkrewi
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria
| | - Mikhail A Moldovan
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria.,Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Marion A L Picard
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria.,Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins (BIOM), Observatoire Océanologique, Banyuls-sur-Mer, France
| | - Beatriz Vicoso
- Institute of Science and Technology Austria, Am Campus 1, Klosterneuburg, 3400, Austria
| |
Collapse
|
5
|
Xiao C, Li J, Xie T, Chen J, Zhang S, Elaksher SH, Jiang F, Jiang Y, Zhang L, Zhang W, Xiang Y, Wu Z, Zhao S, Du X. The assembly of caprine Y chromosome sequence reveals a unique paternal phylogenetic pattern and improves our understanding of the origin of domestic goat. Ecol Evol 2021; 11:7779-7795. [PMID: 34188851 PMCID: PMC8216945 DOI: 10.1002/ece3.7611] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 03/30/2021] [Accepted: 04/06/2021] [Indexed: 02/05/2023] Open
Abstract
The mammalian Y chromosome offers a unique perspective on the male reproduction and paternal evolutionary histories. However, further understanding of the Y chromosome biology for most mammals is hindered by the lack of a Y chromosome assembly. This study presents an integrated in silico strategy for identifying and assembling the goat Y-linked scaffolds using existing data. A total of 11.5 Mb Y-linked sequences were clustered into 33 scaffolds, and 187 protein-coding genes were annotated. We also identified high abundance of repetitive elements. A 5.84 Mb subset was further ordered into an assembly with the evidence from the goat radiation hybrid map (RH map). The existing whole-genome resequencing data of 96 goats (worldwide distribution) were utilized to exploit the paternal relationships among bezoars and domestic goats. Goat paternal lineages were clearly divided into two clades (Y1 and Y2), predating the goat domestication. Demographic history analyses indicated that maternal lineages experienced a bottleneck effect around 2,000 YBP (years before present), after which goats belonging to the A haplogroup spread worldwide from the Near East. As opposed to this, paternal lineages experienced a population decline around the 10,000 YBP. The evidence from the Y chromosome suggests that male goats were not affected by the A haplogroup worldwide transmission, which implies sexually unbalanced contribution to the goat trade and population expansion in post-Neolithic period.
Collapse
Affiliation(s)
- Changyi Xiao
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
| | - Jingjin Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
| | - Tanghui Xie
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
| | - Jianhai Chen
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
- Institutes for Systems GeneticsFrontiers Science Center for Disease‐related Molecular NetworkWest China HospitalSichuan UniversityChengduChina
| | - Sijia Zhang
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
| | - Salma Hassan Elaksher
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
- Genetics and Genetic Engineering DepartmentFaculty of AgricultureBenha UniversityMoshtohorEgypt
| | - Fan Jiang
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
| | - Yaoxin Jiang
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
| | - Lu Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
| | - Wei Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
| | - Yue Xiang
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
| | - Zhenyang Wu
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
- College of Agroforestry Engineering and PlanningTongren UniversityTongrenChina
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
| | - Xiaoyong Du
- College of InformaticsHuazhong Agricultural UniversityWuhanChina
- Key Laboratory of Agricultural Animal Genetics, Breeding and ReproductionMinistry of EducationCollege of Animal Science and Veterinary MedicineHuazhong Agricultural UniversityWuhanChina
| |
Collapse
|
6
|
Sacks BN, Lounsberry ZT, Rando HM, Kluepfel K, Fain SR, Brown SK, Kukekova AV. Sequencing Red Fox Y Chromosome Fragments to Develop Phylogenetically Informative SNP Markers and Glimpse Male-Specific Trans-Pacific Phylogeography. Genes (Basel) 2021; 12:genes12010097. [PMID: 33466657 PMCID: PMC7828831 DOI: 10.3390/genes12010097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/01/2021] [Accepted: 01/11/2021] [Indexed: 11/28/2022] Open
Abstract
The red fox (Vulpes vulpes) has a wide global distribution with many ecotypes and has been bred in captivity for various traits, making it a useful evolutionary model system. The Y chromosome represents one of the most informative markers of phylogeography, yet it has not been well-studied in the red fox due to a lack of the necessary genomic resources. We used a target capture approach to sequence a portion of the red fox Y chromosome in a geographically diverse red fox sample, along with other canid species, to develop single nucleotide polymorphism (SNP) markers, 13 of which we validated for use in subsequent studies. Phylogenetic analyses of the Y chromosome sequences, including calibration to outgroups, confirmed previous estimates of the timing of two intercontinental exchanges of red foxes, the initial colonization of North America from Eurasia approximately half a million years ago and a subsequent continental exchange before the last Pleistocene glaciation (~100,000 years ago). However, in contrast to mtDNA, which showed unidirectional transfer from Eurasia to North America prior to the last glaciation, the Y chromosome appears to have been transferred from North America to Eurasia during this period. Additional sampling is needed to confirm this pattern and to further clarify red fox Y chromosome phylogeography.
Collapse
Affiliation(s)
- Benjamin N. Sacks
- Mammalian Ecology and Conservation Unit of the Veterinary Genetics Laboratory, University of California, Davis, CA 95616, USA; (Z.T.L.); (K.K.); (S.K.B.)
- Department of Population Health and Reproduction, University of California, Davis, CA 95616, USA
- Correspondence:
| | - Zachary T. Lounsberry
- Mammalian Ecology and Conservation Unit of the Veterinary Genetics Laboratory, University of California, Davis, CA 95616, USA; (Z.T.L.); (K.K.); (S.K.B.)
| | - Halie M. Rando
- Department of Animal Sciences, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; (H.M.R.); (A.V.K.)
| | - Kristopher Kluepfel
- Mammalian Ecology and Conservation Unit of the Veterinary Genetics Laboratory, University of California, Davis, CA 95616, USA; (Z.T.L.); (K.K.); (S.K.B.)
| | - Steven R. Fain
- U. S. Fish & Wildlife Service, National Forensics Laboratory, Ashland, OR 97520, USA;
| | - Sarah K. Brown
- Mammalian Ecology and Conservation Unit of the Veterinary Genetics Laboratory, University of California, Davis, CA 95616, USA; (Z.T.L.); (K.K.); (S.K.B.)
| | - Anna V. Kukekova
- Department of Animal Sciences, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; (H.M.R.); (A.V.K.)
| |
Collapse
|
7
|
Beier S, Ulpinnis C, Schwalbe M, Münch T, Hoffie R, Koeppel I, Hertig C, Budhagatapalli N, Hiekel S, Pathi KM, Hensel G, Grosse M, Chamas S, Gerasimova S, Kumlehn J, Scholz U, Schmutzer T. Kmasker plants - a tool for assessing complex sequence space in plant species. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 102:631-642. [PMID: 31823436 DOI: 10.1111/tpj.14645] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 06/10/2023]
Abstract
Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high-throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection of repetitive regions by k-mer counting methods has proved to be reliable. Easy-to-use applications utilizing k-mer counting are in high demand, especially in the domain of plants. We present Kmasker plants, a tool that uses k-mer count information as an assistant throughout the analytical workflow of genome data that is provided as a command-line and web-based solution. Beside its core competence to screen and mask repetitive sequences, we have integrated features that enable comparative studies between different cultivars or closely related species and methods that estimate target specificity of guide RNAs for application of site-directed mutagenesis using Cas9 endonuclease. In addition, we have set up a web service for Kmasker plants that maintains pre-computed indices for 10 of the economically most important cultivated plants. Source code for Kmasker plants has been made publically available at https://github.com/tschmutzer/kmasker. The web service is accessible at https://kmasker.ipk-gatersleben.de.
Collapse
Affiliation(s)
- Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Chris Ulpinnis
- Leibniz Institute of Plant Biochemistry, Bioinformatics and Scientific Data, 06120, Halle, Germany
| | - Markus Schwalbe
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Thomas Münch
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Robert Hoffie
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Iris Koeppel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Christian Hertig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Nagaveni Budhagatapalli
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Stefan Hiekel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Krishna M Pathi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Goetz Hensel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Martin Grosse
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Sindy Chamas
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Sophia Gerasimova
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Jochen Kumlehn
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Thomas Schmutzer
- Department of Natural Sciences III, Institute for Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, 06120, Halle, Germany
| |
Collapse
|
8
|
Palmer DH, Rogers TF, Dean R, Wright AE. How to identify sex chromosomes and their turnover. Mol Ecol 2019; 28:4709-4724. [PMID: 31538682 PMCID: PMC6900093 DOI: 10.1111/mec.15245] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 09/05/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022]
Abstract
Although sex is a fundamental component of eukaryotic reproduction, the genetic systems that control sex determination are highly variable. In many organisms the presence of sex chromosomes is associated with female or male development. Although certain groups possess stable and conserved sex chromosomes, others exhibit rapid sex chromosome evolution, including transitions between male and female heterogamety, and turnover in the chromosome pair recruited to determine sex. These turnover events have important consequences for multiple facets of evolution, as sex chromosomes are predicted to play a central role in adaptation, sexual dimorphism, and speciation. However, our understanding of the processes driving the formation and turnover of sex chromosome systems is limited, in part because we lack a complete understanding of interspecific variation in the mechanisms by which sex is determined. New bioinformatic methods are making it possible to identify and characterize sex chromosomes in a diverse array of non-model species, rapidly filling in the numerous gaps in our knowledge of sex chromosome systems across the tree of life. In turn, this growing data set is facilitating and fueling efforts to address many of the unanswered questions in sex chromosome evolution. Here, we synthesize the available bioinformatic approaches to produce a guide for characterizing sex chromosome system and identity simultaneously across clades of organisms. Furthermore, we survey our current understanding of the processes driving sex chromosome turnover, and highlight important avenues for future research.
Collapse
Affiliation(s)
- Daniela H. Palmer
- Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
| | - Thea F. Rogers
- Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
| | - Rebecca Dean
- Department of Genetics, Evolution and EnvironmentUniversity College LondonLondonUK
| | - Alison E. Wright
- Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
| |
Collapse
|
9
|
Rangavittal S, Stopa N, Tomaszkiewicz M, Sahlin K, Makova KD, Medvedev P. DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies. BMC Genomics 2019; 20:641. [PMID: 31399045 PMCID: PMC6688218 DOI: 10.1186/s12864-019-5996-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 07/26/2019] [Indexed: 11/10/2022] Open
Abstract
Background Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references. Results We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female. Conclusion DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect. Electronic supplementary material The online version of this article (10.1186/s12864-019-5996-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Samarth Rangavittal
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Natasha Stopa
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Kristoffer Sahlin
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. .,The Genome Sciences Institute of the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA. .,The Genome Sciences Institute of the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
10
|
Warris S, Schijlen E, van de Geest H, Vegesna R, Hesselink T, Te Lintel Hekkert B, Sanchez Perez G, Medvedev P, Makova KD, de Ridder D. Correcting palindromes in long reads after whole-genome amplification. BMC Genomics 2018; 19:798. [PMID: 30400848 PMCID: PMC6218980 DOI: 10.1186/s12864-018-5164-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Accepted: 10/15/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Next-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long sequencing reads. RESULTS Here, we present Pacasus, a tool for correcting such errors. Two datasets show that it markedly improves read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA. CONCLUSIONS With Pacasus long-read technologies become available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.
Collapse
Affiliation(s)
- Sven Warris
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands.
| | - Elio Schijlen
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Henri van de Geest
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands.,Present address Genetwister Technologies BV, Wageningen, The Netherlands
| | - Rahulsimham Vegesna
- Bioinformatics and Genomics Graduate Program, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Computation, Bioinformatics, Statistics Graduate Training Program, Pennsylvania State University, University Park, State College, PA, 16802, USA.,The Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Thamara Hesselink
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Bas Te Lintel Hekkert
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Gabino Sanchez Perez
- Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands.,Present address Genetwister Technologies BV, Wageningen, The Netherlands
| | - Paul Medvedev
- The Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Department of Computer Science and Engineering, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, State College, PA, 16802, USA.,The Center for Medical Genomics, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Kateryna D Makova
- The Center for Medical Genomics, Pennsylvania State University, University Park, State College, PA, 16802, USA.,Department of Biology, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
11
|
He W, Ju Y, Zeng X, Liu X, Zou Q. Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae. Front Microbiol 2018; 9:2174. [PMID: 30258427 PMCID: PMC6144933 DOI: 10.3389/fmicb.2018.02174] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 08/24/2018] [Indexed: 12/22/2022] Open
Abstract
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98. For the convenience of users, an online web-server has been built at: http://server.malab.cn/Sc_ncDNAPred/index.jsp.
Collapse
Affiliation(s)
- Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Xiangxiang Zeng
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Xiangrong Liu
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
12
|
Morris J, Darolti I, Bloch NI, Wright AE, Mank JE. Shared and Species-Specific Patterns of Nascent Y Chromosome Evolution in Two Guppy Species. Genes (Basel) 2018; 9:E238. [PMID: 29751570 PMCID: PMC5977178 DOI: 10.3390/genes9050238] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 04/20/2018] [Accepted: 04/26/2018] [Indexed: 11/22/2022] Open
Abstract
Sex chromosomes form once recombination is halted around the sex-determining locus between a homologous pair of chromosomes, resulting in a male-limited Y chromosome. We recently characterized the nascent sex chromosome system in the Trinidadian guppy (Poeciliareticulata). The guppy Y is one of the youngest animal sex chromosomes yet identified, and therefore offers a unique window into the early evolutionary forces shaping sex chromosome formation, particularly the rate of accumulation of repetitive elements and Y-specific sequence. We used comparisons between male and female genomes in P. reticulata and its sister species, Endler’s guppy (P. wingei), which share an ancestral sex chromosome, to identify male-specific sequences and to characterize the degree of differentiation between the X and Y chromosomes. We identified male-specific sequence shared between P. reticulata and P. wingei consistent with a small ancestral non-recombining region. Our assembly of this Y-specific sequence shows substantial homology to the X chromosome, and appears to be significantly enriched for genes implicated in pigmentation. We also found two plausible candidates that may be involved in sex determination. Furthermore, we found that the P. wingei Y chromosome exhibits a greater signature of repetitive element accumulation than the P. reticulata Y chromosome. This suggests that Y chromosome divergence does not necessarily correlate with the time since recombination suppression. Overall, our results reveal the early stages of Y chromosome divergence in the guppy.
Collapse
Affiliation(s)
- Jake Morris
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.
| | - Iulia Darolti
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.
| | - Natasha I Bloch
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.
| | - Alison E Wright
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK.
| | - Judith E Mank
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.
- Department of Organismal Biology, Uppsala University, 752 36 Uppsala, Sweden.
| |
Collapse
|