1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Mouratidis I, Baltoumas FA, Chantzi N, Patsakis M, Chan CS, Montgomery A, Konnaris MA, Aplakidou E, Georgakopoulos GC, Das A, Chartoumpekis DV, Kovac J, Pavlopoulos GA, Georgakopoulos-Soares I. kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Comput Struct Biotechnol J 2024; 23:1919-1928. [PMID: 38711760 PMCID: PMC11070822 DOI: 10.1016/j.csbj.2024.04.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
| | - George C. Georgakopoulos
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
| | - Anshuman Das
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Dionysios V. Chartoumpekis
- Service of Endocrinology, Diabetology and Metabolism, Lausanne University Hospital, Lausanne, Switzerland
| | - Jasna Kovac
- Department of Food Science, The Pennsylvania State University, University Park, PA 16802, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
3
|
Ali N, Wolf C, Kanchan S, Veerabhadraiah SR, Bond L, Turner MW, Jorcyk CL, Hampikian G. 9S1R nullomer peptide induces mitochondrial pathology, metabolic suppression, and enhanced immune cell infiltration, in triple-negative breast cancer mouse model. Biomed Pharmacother 2024; 170:115997. [PMID: 38118350 PMCID: PMC10872342 DOI: 10.1016/j.biopha.2023.115997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/22/2023] Open
Abstract
Nullomers are the shortest strings of absent amino acid (aa) sequences in a species or group of species. Primes are those nullomers that have not been detected in the genome of any species. 9S1R is a 5-aa peptide prime sequence attached to 5-arginine aa, used to treat triple negative breast cancer (TNBC) in an in vivo mouse model. This unique peptide, administered with a trehalose carrier (9S1R-NulloPT), offers enhanced solubility and exhibits distinct anti-cancer effects against TNBC. In our study, we investigated the effect of 9S1R-NulloPT on tumor growth, metabolism, metastatic burden, tumor immune-microenvironment (TME), and transcriptome of aggressive mouse TNBC tumors. Notably, treated mice had smaller tumors in the initial phase of the treatment, as compared to untreated control, and diminished in vivo and ex vivo bioluminescence at later-stages - indicative of metabolically quiescent, dying tumors. The treatment also caused changes in TME with increased infiltration of immune cells and altered tumor transcriptome, with 365 upregulated genes and 710 downregulated genes. Consistent with in vitro data, downregulated genes were enriched in cellular metabolic processes (179), specifically mitochondrial TCA cycle/oxidative phosphorylation (44), and translation machinery/ribosome biogenesis (45). The upregulated genes were associated with the developmental (13), ECM organization (12) and focal adhesion pathways (7). In conclusion, our study demonstrates that 9S1R-NulloPT effectively reduced tumor growth during its initial phase, altering the TME and tumor transcriptome. The treatment induced mitochondrial pathology which led to a metabolic deceleration in tumors, aligning with in vitro observations.
Collapse
Affiliation(s)
- Nilufar Ali
- Department of Biological Sciences, Boise State University, Boise, ID, USA.
| | - Cody Wolf
- Department of Biological Sciences, Boise State University, Boise, ID, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Swarna Kanchan
- Department of Biological Sciences, Boise State University, Boise, ID, USA; Department of Biomedical Sciences, Jaon C. Edwards School of Medicine, Marshall University, Huntington, WV, USA
| | - Shivakumar R Veerabhadraiah
- Department of Orthopaedics, University of Utah, Salt Lake City, UT, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Laura Bond
- Center of Biomedical Research Excellence in Matrix Biology, Boise State University, Boise, ID, USA
| | - Matthew W Turner
- Biomolecular Research Center, Boise State University, Boise, ID, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Cheryl L Jorcyk
- Department of Biological Sciences, Boise State University, Boise, ID, USA; Biomolecular Research Center, Boise State University, Boise, ID, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Greg Hampikian
- Department of Biological Sciences, Boise State University, Boise, ID, USA.
| |
Collapse
|
4
|
Ali N, Wolf C, Kanchan S, Veerabhadraiah SR, Bond L, Turner MW, Jorcyk CL, Hampikian G. Nullomer peptide increases immune cell infiltration and reduces tumor metabolism in triple negative breast cancer mouse model. RESEARCH SQUARE 2023:rs.3.rs-3097552. [PMID: 37461536 PMCID: PMC10350184 DOI: 10.21203/rs.3.rs-3097552/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Background Nullomers are the shortest strings of absent amino acid (aa) sequences in a species or group of species. Primes are those nullomers that have not been detected in the genome of any species. 9S1R is a 5-aa peptide derived from a prime sequence that is tagged with 5 arginine aa, used to treat triple negative breast cancer (TNBC) in an in vivo TNBC mouse model. 9S1R is administered in trehalose (9S1R-NulloPT), which enhances solubility and exhibits some independent effects against tumor growth and is thus an important component in the drug preparation. Method We examined the effect of 9S1R-NulloPT on tumor growth, metabolism, metastatic burden, necrosis, tumor immune microenvironment, and the transcriptome of aggressive mouse TNBC tumors. Results The peptide-treated mice had smaller tumors in the initial phase of the treatment, as compared to the untreated control, and reduced in vivo bioluminescence at later stages, which is indicative of metabolically inactive tumors. A decrease in ex vivo bioluminescence was also observed in the excised tumors of treated mice, but not in the secondary metastasis in the lungs. The treatment also caused changes in tumor immune microenvironment with increased infiltration of immune cells and margin inflammation. The treatment upregulated 365 genes and downregulated 710 genes in tumors compared to the untreated group. Consistent with in vitro findings in breast cancer cell lines, downregulated genes in the treated TNBC tumors include Cellular Metabolic Process Related genes (179), specifically mitochondrial genes associated with TCA cycle/oxidative phosphorylation (44), and translation machinery/ribosome biogenesis genes (45). Among upregulated genes, the Developmental Pathway (13), ECM Organization (12) and Focal Adhesion Related Pathways (7) were noteworthy. We also present data from a pilot study using a bilateral BC mouse model, which supports our findings. Conclusion In conclusion, although 9S1R-NulloPT was moderate at reducing the tumor volume, it altered the tumor immune microenvironment as well as the tumor transcriptome, rendering tumors metabolically less active by downregulating the mitochondrial function and ribosome biogenesis. This corroborates previously published in vitro findings.
Collapse
|
5
|
Stuart JD, Hartman DA, Gray LI, Jones AA, Wickenkamp NR, Hirt C, Safira A, Regas AR, Kondash TM, Yates ML, Driga S, Snow CD, Kading RC. Mosquito tagging using DNA-barcoded nanoporous protein microcrystals. PNAS NEXUS 2022; 1:pgac190. [PMID: 36714845 PMCID: PMC9802479 DOI: 10.1093/pnasnexus/pgac190] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 09/08/2022] [Indexed: 02/01/2023]
Abstract
Conventional mosquito marking technology for mark-release-recapture (MRR) is quite limited in terms of information capacity and efficacy. To overcome both challenges, we have engineered, lab-tested, and field-evaluated a new class of marker particles, in which synthetic, short DNA oligonucleotides (DNA barcodes) are adsorbed and protected within tough, crosslinked porous protein microcrystals. Mosquitoes self-mark through ingestion of microcrystals in their larval habitat. Barcoded microcrystals persist trans-stadially through mosquito development if ingested by larvae, do not significantly affect adult mosquito survivorship, and individual barcoded mosquitoes are detectable in pools of up to at least 20 mosquitoes. We have also demonstrated crystal persistence following adult mosquito ingestion. Barcode sequences can be recovered by qPCR and next-generation sequencing (NGS) without detectable amplification of native mosquito DNA. These DNA-laden protein microcrystals have the potential to radically increase the amount of information obtained from future MRR studies compared to previous studies employing conventional mosquito marking materials.
Collapse
Affiliation(s)
| | | | - Lyndsey I Gray
- Department of Microbiology, Immunology, and Pathology, Colorado State University, Fort Collins, CO 80523, USA
| | - Alec A Jones
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO 80523, USA
| | - Natalie R Wickenkamp
- Department of Microbiology, Immunology, and Pathology, Colorado State University, Fort Collins, CO 80523, USA
| | | | - Aya Safira
- Present address: Just-Evotec Biologics, Seattle, WA 98109, USA
| | - April R Regas
- College of Veterinary Medicine and Biological Sciences, Colorado State University, Fort Collins, CO 80523, USA
| | - Therese M Kondash
- Department of Environmental Health and Radiological Sciences, Colorado State University, Fort Collins, CO 80523, USA,H3 Environmental, Albuquerque, NM 87109 (current)
| | - Margaret L Yates
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Sergei Driga
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Christopher D Snow
- Department of Chemistry, Colorado State University, Fort Collins, CO 80523, USA,School of Biomedical Engineering, Colorado State University, Fort Collins, CO 80523, USA,Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA,Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Rebekah C Kading
- To whom correspondence should be addressed: 176 CVID, Colorado State University, Fort Collins, CO 80523, USA. Tel: (970) 491-7833;
| |
Collapse
|
6
|
Koulouras G, Frith MC. Significant non-existence of sequences in genomes and proteomes. Nucleic Acids Res 2021; 49:3139-3155. [PMID: 33693858 PMCID: PMC8034619 DOI: 10.1093/nar/gkab139] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/11/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.
Collapse
Affiliation(s)
- Grigorios Koulouras
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan
| |
Collapse
|
7
|
Vergni D, Gaudio R, Santoni D. The farther the better: Investigating how distance from human self affects the propensity of a peptide to be presented on cell surface by MHC class I molecules, the case of Trypanosoma cruzi. PLoS One 2020; 15:e0243285. [PMID: 33284846 PMCID: PMC7721184 DOI: 10.1371/journal.pone.0243285] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 11/19/2020] [Indexed: 12/04/2022] Open
Abstract
More than twenty years ago the reverse vaccinology paradigm came to light trying to design new vaccines based on the analysis of genomic information in order to select those pathogen peptides able to trigger an immune response. In this context, focusing on the proteome of Trypanosoma cruzi, we investigated the link between the probabilities for pathogen peptides to be presented on a cell surface and their distance from human self. We found a reasonable but, as far as we know, undiscovered property: the farther the distance between a peptide and the human-self the higher the probability for that peptide to be presented on a cell surface. We also found that the most distant peptides from human self bind, on average, a broader collection of HLAs than expected, implying a potential immunological role in a large portion of individuals. Finally, introducing a novel quantitative indicator for a peptide to measure its potential immunological role, we proposed a pool of peptides that could be potential epitopes and that can be suitable for experimental testing. The software to compute peptide classes according to the distance from human self is free available at http://www.iasi.cnr.it/~dsantoni/nullomers.
Collapse
Affiliation(s)
- Davide Vergni
- Istituto per le Applicazioni del Calcolo “Mauro Picone” - CNR, Rome, Italy
| | - Rosanna Gaudio
- Department of Biology, University Tor Vergata, Rome, Italy
| | - Daniele Santoni
- Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti” - CNR, Rome, Italy
- * E-mail:
| |
Collapse
|
8
|
|
9
|
Alileche A, Hampikian G. The effect of Nullomer-derived peptides 9R, 9S1R and 124R on the NCI-60 panel and normal cell lines. BMC Cancer 2017; 17:533. [PMID: 28793867 PMCID: PMC5551024 DOI: 10.1186/s12885-017-3514-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 07/28/2017] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Nullomer peptides are the smallest sequences absent from databases of natural proteins. We first began compiling a list of absent 5-amino acid strings in 2006 (1). We report here the effects of Nullomer-derived peptides 9R, 9S1R and 124R on the NCI-60 panel, derived from human cancers of 9 organs (kidney, ovary, skin melanoma, lung, brain, lung, colon, prostate and the hematopoietic system), and four normal cell lines (endothelial HUVEC, skin fibroblasts BJ, colon epithelial FHC and normal prostate RWPE-1). METHODS NCI-60 cancer cell panel and four normal cell lines were cultured in vitro in RPMI1640 supplemented with 10% Hyclone fetal bovine serum and exposed for 48 h to 5 μM, 25 μM and 50 μM of peptides 9R, 9S1R and 124R. Viability was assessed by CCK-8 assay. For peptide ATP depletion effects, one cell line representing each organ in the NCI-60 panel, and four normal cell lines were exposed to 50 μM of peptides 9R, 9S1R and 124R for 3 h. The ATP content was assessed in whole cells, and their supernatants. RESULTS Peptides 9S1R and 9R are respectively lethal to 95 and 81.6% of the 60 cancer cell lines tested. Control peptide 124R has no effect on the growth of these cells. Especially interesting the fact that peptides 9R and 9S1R are capable of killing drug-resistant and hormone-resistant cell lines, and even cancer stem cells. Peptides 9R and 9S1R have a broader activity spectrum than many cancer drugs in current use, can completely deplete cellular ATP within 3 h, and are less toxic to 3 of the 4 normal cell lines tested than they are to several cancers. CONCLUSIONS Nullomer peptides 9R and 9S1R have a large broad lethal effect on cancer cell lines derived from nine organs represented in the NCI-60 panel. This broad activity crosses many of the categorical divisions used in the general classification of cancers: solid vs liquid cancers, drug sensitive vs drug resistant, hormone sensitive vs hormone resistant, cytokine sensitive vs cytokine non sensitive, slow growing vs rapid growing, differentiated vs dedifferentiated cancers. Furthermore peptides 9R and 9S1R are lethal to cancer stem cells and breast canrcinosarcoma.
Collapse
Affiliation(s)
- Abdelkrim Alileche
- Biology Department Room SN-215, Boise State University, 1910 University Drive, Boise, ID 83725 USA
| | - Greg Hampikian
- Biology Department Room SN-215, Boise State University, 1910 University Drive, Boise, ID 83725 USA
| |
Collapse
|
10
|
Vergni D, Santoni D. Nullomers and High Order Nullomers in Genomic Sequences. PLoS One 2016; 11:e0164540. [PMID: 27906971 PMCID: PMC5132333 DOI: 10.1371/journal.pone.0164540] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 09/27/2016] [Indexed: 11/19/2022] Open
Abstract
A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications.
Collapse
Affiliation(s)
- Davide Vergni
- Istituto per le Applicazioni del Calcolo “Mauro Picone” - CNR, Via dei Taurini 19, 00185, Rome, Italy
| | - Daniele Santoni
- Istituto di Analisi dei Sistemi ed Informatica “Antonio Ruberti” - CNR, Via dei Taurini 19, 00185, Rome, Italy
- * E-mail:
| |
Collapse
|
11
|
Falda M, Fontana P, Barzon L, Toppo S, Lavezzo E. keeSeek: searching distant non-existing words in genomes for PCR-based applications. ACTA ACUST UNITED AC 2014; 30:2662-4. [PMID: 24867942 DOI: 10.1093/bioinformatics/btu312] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED The search for short words that are absent in the genome of one or more organisms (neverwords, also known as nullomers) is attracting growing interest because of the impact they may have in recent molecular biology applications. keeSeek is able to find absent sequences with primer-like features, which can be used as unique labels for exogenously inserted DNA fragments to recover their exact position into the genome using PCR techniques. The main differences with respect to previously developed tools for neverwords generation are (i) calculation of the distance from the reference genome, in terms of number of mismatches, and selection of the most distant sequences that will have a low probability to anneal unspecifically; (ii) application of a series of filters to discard candidates not suitable to be used as PCR primers. KeeSeek has been implemented in C++ and CUDA (Compute Unified Device Architecture) to work in a General-Purpose Computing on Graphics Processing Units (GPGPU) environment. AVAILABILITY AND IMPLEMENTATION Freely available under the Q Public License at http://www.medcomp.medicina.unipd.it/main_site/doku.php?id=keeseek.
Collapse
Affiliation(s)
- Marco Falda
- Department of Molecular Medicine, University of Padova, Padova, I-35131, Italy and Department of Computational Biology, Edmund Mach Foundation, S. Michele All'Adige, I-38010 (TN), Italy
| | - Paolo Fontana
- Department of Molecular Medicine, University of Padova, Padova, I-35131, Italy and Department of Computational Biology, Edmund Mach Foundation, S. Michele All'Adige, I-38010 (TN), Italy
| | - Luisa Barzon
- Department of Molecular Medicine, University of Padova, Padova, I-35131, Italy and Department of Computational Biology, Edmund Mach Foundation, S. Michele All'Adige, I-38010 (TN), Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, I-35131, Italy and Department of Computational Biology, Edmund Mach Foundation, S. Michele All'Adige, I-38010 (TN), Italy
| | - Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, I-35131, Italy and Department of Computational Biology, Edmund Mach Foundation, S. Michele All'Adige, I-38010 (TN), Italy
| |
Collapse
|