1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Mouratidis I, Baltoumas FA, Chantzi N, Patsakis M, Chan CS, Montgomery A, Konnaris MA, Aplakidou E, Georgakopoulos GC, Das A, Chartoumpekis DV, Kovac J, Pavlopoulos GA, Georgakopoulos-Soares I. kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Comput Struct Biotechnol J 2024; 23:1919-1928. [PMID: 38711760 PMCID: PMC11070822 DOI: 10.1016/j.csbj.2024.04.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
| | - George C. Georgakopoulos
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
| | - Anshuman Das
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Dionysios V. Chartoumpekis
- Service of Endocrinology, Diabetology and Metabolism, Lausanne University Hospital, Lausanne, Switzerland
| | - Jasna Kovac
- Department of Food Science, The Pennsylvania State University, University Park, PA 16802, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
3
|
Chantzi N, Mareboina M, Konnaris MA, Montgomery A, Patsakis M, Mouratidis I, Georgakopoulos-Soares I. The determinants of the rarity of nucleic and peptide short sequences in nature. NAR Genom Bioinform 2024; 6:lqae029. [PMID: 38584871 PMCID: PMC10993293 DOI: 10.1093/nargab/lqae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/09/2024] Open
Abstract
The prevalence of nucleic and peptide short sequences across organismal genomes and proteomes has not been thoroughly investigated. We examined 45 785 reference genomes and 21 871 reference proteomes, spanning archaea, bacteria, eukaryotes and viruses to calculate the rarity of short sequences in them. To capture this, we developed a metric of the rarity of each sequence in nature, the rarity index. We find that the frequency of certain dipeptides in rare oligopeptide sequences is hundreds of times lower than expected, which is not the case for any dinucleotides. We also generate predictive regression models that infer the rarity of nucleic and proteomic sequences across nature or within each domain of life and viruses separately. When examining each of the three domains of life and viruses separately, the R² performance of the model predicting rarity for 5-mer peptides from mono- and dipeptides ranged between 0.814 and 0.932. A separate model predicting rarity for 10-mer oligonucleotides from mono- and dinucleotides achieved R² performance between 0.408 and 0.606. Our results indicate that the mono- and dinucleotide composition of nucleic sequences and the mono- and dipeptide composition of peptide sequences can explain a significant proportion of the variance in their frequencies in nature.
Collapse
Affiliation(s)
- Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Maxwell A Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
- Department of Statistics, Penn State University, University Park, PA, 16802, USA
- Huck Institutes of the Life Sciences, Penn State University, University Park, PA, 16802, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
- Huck Institutes of the Life Sciences, Penn State University, University Park, PA, 16802, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA
| |
Collapse
|
4
|
Montgomery A, Tsiatsianis GC, Mouratidis I, Chan CSY, Athanasiou M, Papanastasiou AD, Kantere V, Syrigos N, Vathiotis I, Syrigos K, Yee NS, Georgakopoulos-Soares I. Utilizing nullomers in cell-free RNA for early cancer detection. Cancer Gene Ther 2024; 31:861-870. [PMID: 38351138 PMCID: PMC11192629 DOI: 10.1038/s41417-024-00741-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/25/2024] [Accepted: 01/26/2024] [Indexed: 06/23/2024]
Abstract
Early detection of cancer can significantly improve patient outcomes; however, sensitive and highly specific biomarkers for cancer detection are currently missing. Nullomers are the shortest sequences that are absent from the human genome but can emerge due to somatic mutations in cancer. We examine over 10,000 whole exome sequencing matched tumor-normal samples to characterize nullomer emergence across exonic regions of the genome. We also identify nullomer emerging mutational hotspots within tumor genes. Finally, we provide evidence for the identification of nullomers in cell-free RNA from peripheral blood samples, enabling detection of multiple tumor types. We show multiple tumor classification models with an AUC greater than 0.9, including a hepatocellular carcinoma classifier with an AUC greater than 0.99.
Collapse
Affiliation(s)
- Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Georgios Christos Tsiatsianis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S Y Chan
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Maria Athanasiou
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | | | - Verena Kantere
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Nikos Syrigos
- Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
| | - Ioannis Vathiotis
- Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
| | - Konstantinos Syrigos
- Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
| | - Nelson S Yee
- Next Generation Therapies Program, Penn State Cancer Institute; Division of Hematology-Oncology, Department of Medicine, Penn State Health Milton S. Hershey Medical Center, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
| |
Collapse
|
5
|
Wang S, Meng F, Chen P, Lv Y, Wu M, Tang H, Bao H, Wu X, Shao Y, Wang J, Dai J, Xu L, Wang X, Yin R. Cell-free DNA assay for malignancy classification of high-risk lung nodules. J Thorac Cardiovasc Surg 2024:S0022-5223(24)00370-2. [PMID: 38670484 DOI: 10.1016/j.jtcvs.2024.04.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 03/18/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024]
Abstract
OBJECTIVE Although low-dose computed tomography has been proven effective to reduce lung cancer-specific mortality, a considerable proportion of surgically resected high-risk lung nodules were still confirmed pathologically benign. There is an unmet need of a novel method for malignancy classification in lung nodules. METHODS We recruited 307 patients with high-risk lung nodules who underwent curative surgery, and 247 and 60 cases were pathologically confirmed malignant and benign lung lesions, respectively. Plasma samples from each patient were collected before surgery and performed low-depth (5×) whole-genome sequencing. We extracted cell-free DNA characteristics and determined radiomic features. We built models to classify the malignancy using our data and further validated models with 2 independent lung nodule cohorts. RESULTS Our models using one type of profile were able to distinguish lung cancer and benign lung nodules at an area under the curve metrics of 0.69 to 0.91 in the study cohort. Integrating all the 5 base models using cell-free DNA profiles, the cell-free DNA-based ensemble model achieved an area under the curve of 0.95 (95% CI, 0.92-0.97) in the study cohort and 0.98 (95% CI, 0.96-1.00) in the validation cohort. At a specificity of 95.0%, the sensitivity reached 80.0% in the study cohort. With the same threshold, the specificity and sensitivity had similar performances in both validation cohorts. Furthermore, the performance of area under the curve reached 0.97 in both the study and validation cohorts when considering the radiomic profile. CONCLUSIONS The cell-free DNA profiles-based method is an efficient noninvasive tool to distinguish malignancies and high-risk but pathologically benign lung nodules.
Collapse
Affiliation(s)
- Siwei Wang
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China; Clinical Research Institute of Traditional Chinese Medicine, Jiangsu Province Hospital of Chinese Medicine, The Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China
| | - Fanchen Meng
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China
| | - Peng Chen
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China
| | - Yang Lv
- Department of Information Center, The Affiliated Jiangning Hospital of Nanjing Medical University, Nanjing, Jiangsu, China
| | - Min Wu
- Geneseeq Research Institute, Nanjing Geneseeq Technology Inc, Nanjing, Jiangsu, China
| | - Haimeng Tang
- Geneseeq Research Institute, Nanjing Geneseeq Technology Inc, Nanjing, Jiangsu, China
| | - Hua Bao
- Geneseeq Research Institute, Nanjing Geneseeq Technology Inc, Nanjing, Jiangsu, China
| | - Xue Wu
- Geneseeq Research Institute, Nanjing Geneseeq Technology Inc, Nanjing, Jiangsu, China
| | - Yang Shao
- Geneseeq Research Institute, Nanjing Geneseeq Technology Inc, Nanjing, Jiangsu, China; School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jie Wang
- Department of Science and Technology, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China; Biobank of Lung Cancer, Jiangsu Biobank of Clinical Resources, Nanjing, Jiangsu, China
| | - Juncheng Dai
- School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China; Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Lin Xu
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China; Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Xiaoxiao Wang
- Clinical Research Institute of Traditional Chinese Medicine, Jiangsu Province Hospital of Chinese Medicine, The Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, Jiangsu, China.
| | - Rong Yin
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China; Department of Science and Technology, Nanjing Medical University Affiliated Cancer Hospital & Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research, Nanjing, Jiangsu, China; Biobank of Lung Cancer, Jiangsu Biobank of Clinical Resources, Nanjing, Jiangsu, China; Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu, China.
| |
Collapse
|
6
|
Tsiatsianis GC, Chan CSY, Mouratidis I, Chantzi N, Tsiatsiani AM, Yee NS, Zaravinos A, Kantere V, Georgakopoulos-Soares I. Peptide absent sequences emerging in human cancers. Eur J Cancer 2024; 196:113421. [PMID: 37952501 DOI: 10.1016/j.ejca.2023.113421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/01/2023] [Accepted: 11/01/2023] [Indexed: 11/14/2023]
Abstract
Early diagnosis of cancer can significantly improve survival of cancer patients; however sensitive and highly specific biomarkers for cancer detection are currently lacking for most cancer types. Nullpeptides are short peptides that are absent from the human proteome. Here, we examined the emergence of nullpeptides during cancer development. We analyzed 3,600,964 somatic mutations across 10,064 whole exome sequencing tumor samples spanning 32 cancer types. We analyze RNA-seq data from primary tumor samples to identify the subset of nullpeptides that emerge in highly expresed genes. We show that nullpeptides, and particularly the subset that is highly recurrent across cancer patients, can be identified in tumor biopsy samples. We find that cancer genes show an excess of nullpeptides and detect nullpeptide hotspots in specific loci of oncogenes and tumor suppressors. We also observe that recurrent nullpeptides are more likely to be found in neoantigens, which have been shown to be effective targets for immunotherapy, suggesting that they can be used to prioritize candidates. Our findings provide evidence for the utility of nullpeptides as cancer detection and therapeutic biomarkers.
Collapse
Affiliation(s)
- Georgios Christos Tsiatsianis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA; National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Anna Maria Tsiatsiani
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece; School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
| | - Nelson S Yee
- Division of Hematology-Oncology, Department of Medicine, Penn State Health Milton S. Hershey Medical Center, Next-Generation Therapies Program, Penn State Cancer Institute, Hershey, PA, USA
| | - Apostolos Zaravinos
- Department of Life Sciences, School of Sciences, European University Cyprus, Nicosia 1516, Cyprus; Cancer Genetics, Genomics and Systems Biology Laboratory, Basic and Translational Cancer Research Center (BTCRC), Nicosia 1516, Cyprus
| | - Verena Kantere
- School of Electrical Engineering and Computer Science, Faculty of Engineering, University of Ottawa, Canada
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
| |
Collapse
|
7
|
Ali N, Wolf C, Kanchan S, Veerabhadraiah SR, Bond L, Turner MW, Jorcyk CL, Hampikian G. 9S1R nullomer peptide induces mitochondrial pathology, metabolic suppression, and enhanced immune cell infiltration, in triple-negative breast cancer mouse model. Biomed Pharmacother 2024; 170:115997. [PMID: 38118350 PMCID: PMC10872342 DOI: 10.1016/j.biopha.2023.115997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/22/2023] Open
Abstract
Nullomers are the shortest strings of absent amino acid (aa) sequences in a species or group of species. Primes are those nullomers that have not been detected in the genome of any species. 9S1R is a 5-aa peptide prime sequence attached to 5-arginine aa, used to treat triple negative breast cancer (TNBC) in an in vivo mouse model. This unique peptide, administered with a trehalose carrier (9S1R-NulloPT), offers enhanced solubility and exhibits distinct anti-cancer effects against TNBC. In our study, we investigated the effect of 9S1R-NulloPT on tumor growth, metabolism, metastatic burden, tumor immune-microenvironment (TME), and transcriptome of aggressive mouse TNBC tumors. Notably, treated mice had smaller tumors in the initial phase of the treatment, as compared to untreated control, and diminished in vivo and ex vivo bioluminescence at later-stages - indicative of metabolically quiescent, dying tumors. The treatment also caused changes in TME with increased infiltration of immune cells and altered tumor transcriptome, with 365 upregulated genes and 710 downregulated genes. Consistent with in vitro data, downregulated genes were enriched in cellular metabolic processes (179), specifically mitochondrial TCA cycle/oxidative phosphorylation (44), and translation machinery/ribosome biogenesis (45). The upregulated genes were associated with the developmental (13), ECM organization (12) and focal adhesion pathways (7). In conclusion, our study demonstrates that 9S1R-NulloPT effectively reduced tumor growth during its initial phase, altering the TME and tumor transcriptome. The treatment induced mitochondrial pathology which led to a metabolic deceleration in tumors, aligning with in vitro observations.
Collapse
Affiliation(s)
- Nilufar Ali
- Department of Biological Sciences, Boise State University, Boise, ID, USA.
| | - Cody Wolf
- Department of Biological Sciences, Boise State University, Boise, ID, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Swarna Kanchan
- Department of Biological Sciences, Boise State University, Boise, ID, USA; Department of Biomedical Sciences, Jaon C. Edwards School of Medicine, Marshall University, Huntington, WV, USA
| | - Shivakumar R Veerabhadraiah
- Department of Orthopaedics, University of Utah, Salt Lake City, UT, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Laura Bond
- Center of Biomedical Research Excellence in Matrix Biology, Boise State University, Boise, ID, USA
| | - Matthew W Turner
- Biomolecular Research Center, Boise State University, Boise, ID, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Cheryl L Jorcyk
- Department of Biological Sciences, Boise State University, Boise, ID, USA; Biomolecular Research Center, Boise State University, Boise, ID, USA; Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, USA
| | - Greg Hampikian
- Department of Biological Sciences, Boise State University, Boise, ID, USA.
| |
Collapse
|
8
|
Mouratidis I, Chantzi N, Khan U, Konnaris MA, Chan CSY, Mareboina M, Moeckel C, Georgakopoulos-Soares I. Frequentmers - a novel way to look at metagenomic next generation sequencing data and an application in detecting liver cirrhosis. BMC Genomics 2023; 24:768. [PMID: 38087204 PMCID: PMC10714505 DOI: 10.1186/s12864-023-09861-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open
Abstract
Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients are past efficacious treatment periods and can result in less favorable outcomes. Therefore, methods that can accurately detect human disease at a presymptomatic stage are urgently needed. Here, we introduce "frequentmers"; short sequences that are specific and recurrently observed in either patient or healthy control samples, but not in both. We showcase the utility of frequentmers for the detection of liver cirrhosis using metagenomic Next Generation Sequencing data from stool samples of patients and controls. We develop classification models for the detection of liver cirrhosis and achieve an AUC score of 0.91 using ten-fold cross-validation. A small subset of 200 frequentmers can achieve comparable results in detecting liver cirrhosis. Finally, we identify the microbial organisms in liver cirrhosis samples, which are associated with the most predictive frequentmer biomarkers.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA.
| | - Nikol Chantzi
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Umair Khan
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Maxwell A Konnaris
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State, University Park, PA, USA
- Huck Institutes of the Life Sciences, Penn State, University Park, PA, USA
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Manvita Mareboina
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Camille Moeckel
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA.
| |
Collapse
|
9
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.08.561337. [PMID: 37873463 PMCID: PMC10592693 DOI: 10.1101/2023.10.08.561337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known, but quantifying this is required to understand the constraints faced by cell systems as they evolve. Here, we use the model organism S. cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, most of the resulting tyrosine phosphorylation is spurious. This provides a suitable system to measure the impact of artificial protein interactions on fitness. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3,500 proteins. Examination of the fitness costs in each strain revealed a strong correlation between the number of spurious pY sites and decreased growth. Moreover, the analysis of pY effects on protein structure and on protein function revealed over 1000 pY events that we predict to be deleterious. However, we also find that a large number of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with bona fide tyrosine kinases. Taken together, our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| |
Collapse
|
10
|
Ali N, Wolf C, Kanchan S, Veerabhadraiah SR, Bond L, Turner MW, Jorcyk CL, Hampikian G. Nullomer peptide increases immune cell infiltration and reduces tumor metabolism in triple negative breast cancer mouse model. RESEARCH SQUARE 2023:rs.3.rs-3097552. [PMID: 37461536 PMCID: PMC10350184 DOI: 10.21203/rs.3.rs-3097552/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Background Nullomers are the shortest strings of absent amino acid (aa) sequences in a species or group of species. Primes are those nullomers that have not been detected in the genome of any species. 9S1R is a 5-aa peptide derived from a prime sequence that is tagged with 5 arginine aa, used to treat triple negative breast cancer (TNBC) in an in vivo TNBC mouse model. 9S1R is administered in trehalose (9S1R-NulloPT), which enhances solubility and exhibits some independent effects against tumor growth and is thus an important component in the drug preparation. Method We examined the effect of 9S1R-NulloPT on tumor growth, metabolism, metastatic burden, necrosis, tumor immune microenvironment, and the transcriptome of aggressive mouse TNBC tumors. Results The peptide-treated mice had smaller tumors in the initial phase of the treatment, as compared to the untreated control, and reduced in vivo bioluminescence at later stages, which is indicative of metabolically inactive tumors. A decrease in ex vivo bioluminescence was also observed in the excised tumors of treated mice, but not in the secondary metastasis in the lungs. The treatment also caused changes in tumor immune microenvironment with increased infiltration of immune cells and margin inflammation. The treatment upregulated 365 genes and downregulated 710 genes in tumors compared to the untreated group. Consistent with in vitro findings in breast cancer cell lines, downregulated genes in the treated TNBC tumors include Cellular Metabolic Process Related genes (179), specifically mitochondrial genes associated with TCA cycle/oxidative phosphorylation (44), and translation machinery/ribosome biogenesis genes (45). Among upregulated genes, the Developmental Pathway (13), ECM Organization (12) and Focal Adhesion Related Pathways (7) were noteworthy. We also present data from a pilot study using a bilateral BC mouse model, which supports our findings. Conclusion In conclusion, although 9S1R-NulloPT was moderate at reducing the tumor volume, it altered the tumor immune microenvironment as well as the tumor transcriptome, rendering tumors metabolically less active by downregulating the mitochondrial function and ribosome biogenesis. This corroborates previously published in vitro findings.
Collapse
|
11
|
Mouratidis I, Chan CY, Chantzi N, Tsiatsianis G, Hemberg M, Ahituv N, Georgakopoulos-Soares I. Quasi-prime peptides: identification of the shortest peptide sequences unique to a species. NAR Genom Bioinform 2023; 5:lqad039. [PMID: 37101657 PMCID: PMC10124967 DOI: 10.1093/nargab/lqad039] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/02/2023] [Accepted: 04/06/2023] [Indexed: 04/28/2023] Open
Abstract
Determining the organisms present in a biosample has many important applications in agriculture, wildlife conservation, and healthcare. Here, we develop a universal fingerprint based on the identification of short peptides that are unique to a specific organism. We define quasi-prime peptides as sequences that are found in only one species, and we analyzed proteomes from 21 875 species, from viruses to humans, and annotated the smallest peptide kmer sequences that are unique to a species and absent from all other proteomes. We also perform simulations across all reference proteomes and observe a lower than expected number of peptide kmers across species and taxonomies, indicating an enrichment for nullpeptides, sequences absent from a proteome. For humans, we find that quasi-primes are found in genes enriched for specific gene ontology terms, including proteasome and ATP and GTP catalysis. We also provide a set of quasi-prime peptides for a number of human pathogens and model organisms and further showcase its utility via two case studies for Mycobacterium tuberculosis and Vibrio cholerae, where we identify quasi-prime peptides in two transmembrane and extracellular proteins with relevance for pathogen detection. Our catalog of quasi-prime peptides provides the smallest unit of information that is specific to a single organism at the protein level, providing a versatile tool for species identification.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA, USA
- Department of Engineering Science, KU Leuven, Leuven, Belgium
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Nikol Chantzi
- Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA, USA
| | - Georgios Christos Tsiatsianis
- Department of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA, USA
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
| | - Martin Hemberg
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, USA
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | | |
Collapse
|