1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Zbinden ZD, Douglas MR, Chafin TK, Douglas ME. A community genomics approach to natural hybridization. Proc Biol Sci 2023; 290:20230768. [PMID: 37192670 PMCID: PMC10188237 DOI: 10.1098/rspb.2023.0768] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 04/26/2023] [Indexed: 05/18/2023] Open
Abstract
Hybridization is a complicated, oft-misunderstood process. Once deemed unnatural and uncommon, hybridization is now recognized as ubiquitous among species. But hybridization rates within and among communities are poorly understood despite the relevance to ecology, evolution and conservation. To clarify, we examined hybridization across 75 freshwater fish communities within the Ozarks of the North American Interior Highlands (USA) by single nucleotide polymorphism (SNP) genotyping 33 species (N = 2865 individuals; double-digest restriction site-associated DNA sequencing (ddRAD)). We found evidence of hybridization (70 putative hybrids; 2.4% of individuals) among 18 species-pairs involving 73% (24/33) of study species, with the majority being concentrated within one family (Leuciscidae/minnows; 15 species; 66 hybrids). Interspecific genetic exchange-or introgression-was evident from 24 backcrossed individuals (10/18 species-pairs). Hybrids occurred within 42 of 75 communities (56%). Four selected environmental variables (species richness, protected area extent, precipitation (May and annually)) exhibited 73-78% accuracy in predicting hybrid occurrence via random forest classification. Our community-level assessment identified hybridization as spatially widespread and environmentally dependent (albeit predominantly within one diverse, omnipresent family). Our approach provides a more holistic survey of natural hybridization by testing a wide range of species-pairs, thus contrasting with more conventional evaluations.
Collapse
Affiliation(s)
- Zachery D. Zbinden
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Marlis R. Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Tyler K. Chafin
- Biomathematics and Statistics Scotland, Edinburgh, Scotland, UK
| | - Michael E. Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
3
|
Evolutionary Insight into the Trypanosomatidae Using Alignment-Free Phylogenomics of the Kinetoplast. Pathogens 2019; 8:pathogens8030157. [PMID: 31540520 PMCID: PMC6789588 DOI: 10.3390/pathogens8030157] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022] Open
Abstract
Advancements in next-generation sequencing techniques have led to a substantial increase in the genomic information available for analyses in evolutionary biology. As such, this data requires the exponential growth in bioinformatic methods and expertise required to understand such vast quantities of genomic data. Alignment-free phylogenomics offer an alternative approach for large-scale analyses that may have the potential to address these challenges. The evolutionary relationships between various species within the trypanosomatid family, specifically members belonging to the genera Leishmania and Trypanosoma have been extensively studies over the last 30 years. However, there is a need for a more exhaustive analysis of the Trypanosomatidae, summarising the evolutionary patterns amongst the entire family of these important protists. The mitochondrial DNA of the trypanosomatids, better known as the kinetoplast, represents a valuable taxonomic marker given its unique presence across all kinetoplastid protozoans. The aim of this study was to validate the reliability and robustness of alignment-free approaches for phylogenomic analyses and its applicability to reconstruct the evolutionary relationships between the trypanosomatid family. In the present study, alignment-free analyses demonstrated the strength of these methods, particularly when dealing with large datasets compared to the traditional phylogenetic approaches. We present a maxicircle genome phylogeny of 46 species spanning the trypanosomatid family, demonstrating the superiority of the maxicircle for the analysis and taxonomic resolution of the Trypanosomatidae.
Collapse
|
4
|
Zhang F, Ding Y, Zhu C, Zhou X, Orr MC, Scheu S, Luan Y. Phylogenomics from low‐coverage whole‐genome sequencing. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13145] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Feng Zhang
- Department of EntomologyCollege of Plant ProtectionNanjing Agricultural University Nanjing P. R. China
- Key Laboratory of the Zoological Systematics and EvolutionInstitute of ZoologyChinese Academy of Sciences Beijing P. R. China
- J. F. Blumenbach Institute of Zoology and AnthropologyUniversity of Göttingen Göttingen Germany
| | - Yinhuan Ding
- Department of EntomologyCollege of Plant ProtectionNanjing Agricultural University Nanjing P. R. China
| | - Chao‐Dong Zhu
- Key Laboratory of the Zoological Systematics and EvolutionInstitute of ZoologyChinese Academy of Sciences Beijing P. R. China
- College of Life SciencesUniversity of Chinese Academy of Sciences Beijing P. R. China
| | - Xin Zhou
- Department of EntomologyChina Agricultural University Beijing P. R. China
| | - Michael C. Orr
- Key Laboratory of the Zoological Systematics and EvolutionInstitute of ZoologyChinese Academy of Sciences Beijing P. R. China
| | - Stefan Scheu
- J. F. Blumenbach Institute of Zoology and AnthropologyUniversity of Göttingen Göttingen Germany
| | - Yun‐Xia Luan
- Guangdong Provincial Key Laboratory of Insect Developmental Biology and Applied TechnologyInstitute of Insect Science and TechnologySchool of Life SciencesSouth China Normal University Guangzhou P. R. China
| |
Collapse
|