1
|
Bonnici V, Chicco D. Seven quick tips for gene-focused computational pangenomic analysis. BioData Min 2024; 17:28. [PMID: 39227987 PMCID: PMC11370085 DOI: 10.1186/s13040-024-00380-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 08/12/2024] [Indexed: 09/05/2024] Open
Abstract
Pangenomics is a relatively new scientific field which investigates the union of all the genomes of a clade. The word pan means everything in ancient Greek; the term pangenomics originally regarded genomes of bacteria and was later intended to refer to human genomes as well. Modern bioinformatics offers several tools to analyze pangenomics data, paving the way to an emerging field that we can call computational pangenomics. Current computational power available for the bioinformatics community has made computational pangenomic analyses easy to perform, but this higher accessibility to pangenomics analysis also increases the chances to make mistakes and to produce misleading or inflated results, especially by beginners. To handle this problem, we present here a few quick tips for efficient and correct computational pangenomic analyses with a focus on bacterial pangenomics, by describing common mistakes to avoid and experienced best practices to follow in this field. We believe our recommendations can help the readers perform more robust and sound pangenomic analyses and to generate more reliable results.
Collapse
Affiliation(s)
- Vincenzo Bonnici
- Dipartimento di Scienze Matematiche Fisiche e Informatiche, Università di Parma, Parma, Italy.
| | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy.
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
2
|
Murakami K, Tago SI, Takishita S, Morikawa H, Kojima R, Yokoyama K, Ogawa M, Fukushima H, Takamori H, Nannya Y, Imoto S, Fuji M. Pathogenicity Prediction of Gene Fusion in Structural Variations: A Knowledge Graph-Infused Explainable Artificial Intelligence (XAI) Framework. Cancers (Basel) 2024; 16:1915. [PMID: 38791993 PMCID: PMC11120556 DOI: 10.3390/cancers16101915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 04/26/2024] [Accepted: 05/01/2024] [Indexed: 05/26/2024] Open
Abstract
When analyzing cancer sample genomes in clinical practice, many structural variants (SVs), other than single nucleotide variants (SNVs), have been identified. To identify driver variants, the leading candidates must be narrowed down. When fusion genes are involved, selection is particularly difficult, and highly accurate predictions from AI is important. Furthermore, we also wanted to determine how the prediction can make more reliable diagnoses. Here, we developed an explainable AI (XAI) suitable for SVs with gene fusions, based on the XAI technology we previously developed for the prediction of SNV pathogenicity. To cope with gene fusion variants, we added new data to the previous knowledge graph for SVs and we improved the algorithm. Its prediction accuracy was as high as that of existing tools. Moreover, our XAI could explain the reasons for these predictions. We used some variant examples to demonstrate that the reasons are plausible in terms of pathogenic basic mechanisms. These results can be seen as a hopeful step toward the future of genomic medicine, where efficient and correct decisions can be made with the support of AI.
Collapse
Affiliation(s)
- Katsuhiko Murakami
- Computing Laboratories, Fujitsu Research, Fujitsu Ltd., Kawasaki 211-8588, Kanagawa, Japan
| | - Shin-ichiro Tago
- Computing Laboratories, Fujitsu Research, Fujitsu Ltd., Kawasaki 211-8588, Kanagawa, Japan
| | - Sho Takishita
- Computing Laboratories, Fujitsu Research, Fujitsu Ltd., Kawasaki 211-8588, Kanagawa, Japan
| | - Hiroaki Morikawa
- Computing Laboratories, Fujitsu Research, Fujitsu Ltd., Kawasaki 211-8588, Kanagawa, Japan
| | - Rikuhiro Kojima
- Computing Laboratories, Fujitsu Research, Fujitsu Ltd., Kawasaki 211-8588, Kanagawa, Japan
| | - Kazuaki Yokoyama
- Division of Hematopoietic Disease Control, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Miho Ogawa
- Division of Hematopoietic Disease Control, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
- The University of Tokyo Hospital, The University of Tokyo, Tokyo 113-8655, Japan
| | - Hidehito Fukushima
- Division of Hematopoietic Disease Control, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Hiroyuki Takamori
- Division of Hematopoietic Disease Control, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Yasuhito Nannya
- Division of Hematopoietic Disease Control, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Masaru Fuji
- Computing Laboratories, Fujitsu Research, Fujitsu Ltd., Kawasaki 211-8588, Kanagawa, Japan
| |
Collapse
|
3
|
Lovino M, Ficarra E, Martignetti L. Integrated microRNA and proteome analysis of cancer datasets with MoPC. PLoS One 2024; 19:e0289699. [PMID: 38512819 PMCID: PMC10956802 DOI: 10.1371/journal.pone.0289699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 07/25/2023] [Indexed: 03/23/2024] Open
Abstract
MicroRNAs (miRNAs) are small molecules that play an essential role in regulating gene expression by post-transcriptional gene silencing. Their study is crucial in revealing the fundamental processes underlying pathologies and, in particular, cancer. To date, most studies on miRNA regulation consider the effect of specific miRNAs on specific target mRNAs, providing wet-lab validation. However, few tools have been developed to explain the miRNA-mediated regulation at the protein level. In this paper, the MoPC computational tool is presented, that relies on the partial correlation between mRNAs and proteins conditioned on the miRNA expression to predict miRNA-target interactions in multi-omic datasets. MoPC returns the list of significant miRNA-target interactions and plot the significant correlations on the heatmap in which the miRNAs and targets are ordered by the chromosomal location. The software was applied on three TCGA/CPTAC datasets (breast, glioblastoma, and lung cancer), returning enriched results in three independent targets databases.
Collapse
Affiliation(s)
- Marta Lovino
- Dipartimento di Ingegneria Enzo Ferrari, University of Modena and Reggio Emilia, Modena, Italy
| | - Elisa Ficarra
- Dipartimento di Ingegneria Enzo Ferrari, University of Modena and Reggio Emilia, Modena, Italy
| | - Loredana Martignetti
- Institut Curie, INSERM U900, MINES ParisTech, PSL Research University, Paris, France
| |
Collapse
|
4
|
Panicker S, Chengizkhan G, Gor R, Ramachandran I, Ramalingam S. Exploring the Relationship between Fusion Genes and MicroRNAs in Cancer. Cells 2023; 12:2467. [PMID: 37887311 PMCID: PMC10605240 DOI: 10.3390/cells12202467] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/05/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
Fusion genes are key cancer driver genes that can be used as potential drug targets in precision therapies, and they can also serve as accurate diagnostic and prognostic biomarkers. The fusion genes can cause microRNA (miRNA/miR) aberrations in many types of cancer. Nevertheless, whether fusion genes incite miRNA aberrations as one of their many critical oncogenic functionalities for driving carcinogenesis needs further investigation. Recent discoveries of miRNA genes that are present within the regions of genomic rearrangements that initiate fusion gene-based intronic miRNA dysregulation have brought the fusion genes into the limelight and revealed their unexplored potential in the field of cancer biology. Fusion gene-based 'promoter-switch' event aberrantly activate the miRNA-related upstream regulatory signals, while fusion-based coding region alterations disrupt the original miRNA coding loci. Fusion genes can potentially regulate the miRNA aberrations regardless of the protein-coding capability of the resultant fusion transcript. Studies on out-of-frame fusion and nonrecurrent fusion genes that cause miRNA dysregulation have attracted the attention of researchers on fusion genes from an oncological perspective and therefore could have potential implications in cancer therapies. This review will provide insights into the role of fusion genes and miRNAs, and their possible interrelationships in cancer.
Collapse
Affiliation(s)
- Saurav Panicker
- Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, Tamil Nadu, India; (S.P.); (R.G.)
| | - Gautham Chengizkhan
- Department of Endocrinology, Dr. ALM PG Institute of Basic Medical Sciences, University of Madras, Taramani Campus, Chennai 600113, Tamil Nadu, India;
| | - Ravi Gor
- Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, Tamil Nadu, India; (S.P.); (R.G.)
| | - Ilangovan Ramachandran
- Department of Endocrinology, Dr. ALM PG Institute of Basic Medical Sciences, University of Madras, Taramani Campus, Chennai 600113, Tamil Nadu, India;
| | - Satish Ramalingam
- Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, Tamil Nadu, India; (S.P.); (R.G.)
| |
Collapse
|
5
|
Stefanini M, Lovino M, Cucchiara R, Ficarra E. Predicting gene and protein expression levels from DNA and protein sequences with Perceiver. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 234:107504. [PMID: 37004267 DOI: 10.1016/j.cmpb.2023.107504] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 03/06/2023] [Accepted: 03/21/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND AND OBJECTIVE The functions of an organism and its biological processes result from the expression of genes and proteins. Therefore quantifying and predicting mRNA and protein levels is a crucial aspect of scientific research. Concerning the prediction of mRNA levels, the available approaches use the sequence upstream and downstream of the Transcription Start Site (TSS) as input to neural networks. The State-of-the-art models (e.g., Xpresso and Basenjii) predict mRNA levels exploiting Convolutional (CNN) or Long Short Term Memory (LSTM) Networks. However, CNN prediction depends on convolutional kernel size, and LSTM suffers from capturing long-range dependencies in the sequence. Concerning the prediction of protein levels, as far as we know, there is no model for predicting protein levels by exploiting the gene or protein sequences. METHODS Here, we exploit a new model type (called Perceiver) for mRNA and protein level prediction, exploiting a Transformer-based architecture with an attention module to attend to long-range interactions in the sequences. In addition, the Perceiver model overcomes the quadratic complexity of the standard Transformer architectures. This work's contributions are 1. DNAPerceiver model to predict mRNA levels from the sequence upstream and downstream of the TSS; 2. ProteinPerceiver model to predict protein levels from the protein sequence; 3. Protein&DNAPerceiver model to predict protein levels from TSS and protein sequences. RESULTS The models are evaluated on cell lines, mice, glioblastoma, and lung cancer tissues. The results show the effectiveness of the Perceiver-type models in predicting mRNA and protein levels. CONCLUSIONS This paper presents a Perceiver architecture for mRNA and protein level prediction. In the future, inserting regulatory and epigenetic information into the model could improve mRNA and protein level predictions. The source code is freely available at https://github.com/MatteoStefanini/DNAPerceiver.
Collapse
Affiliation(s)
- Matteo Stefanini
- DIEF, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41125, Italy
| | - Marta Lovino
- DIEF, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41125, Italy.
| | - Rita Cucchiara
- DIEF, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41125, Italy
| | - Elisa Ficarra
- DIEF, University of Modena and Reggio Emilia, Via Vivarelli 10/1, Modena, 41125, Italy
| |
Collapse
|
6
|
In silico validation of RNA-Seq results can identify gene fusions with oncogenic potential in glioblastoma. Sci Rep 2022; 12:14439. [PMID: 36002559 PMCID: PMC9402576 DOI: 10.1038/s41598-022-18608-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/16/2022] [Indexed: 11/08/2022] Open
Abstract
RNA-Sequencing (RNA-Seq) can identify gene fusions in tumors, but not all these fusions have functional consequences. Using multiple data bases, we have performed an in silico analysis of fusions detected by RNA-Seq in tumor samples from 139 newly diagnosed glioblastoma patients to identify in-frame fusions with predictable oncogenic potential. Among 61 samples with fusions, there were 103 different fusions, involving 167 different genes, including 20 known oncogenes or tumor suppressor genes (TSGs), 16 associated with cancer but not oncogenes or TSGs, and 32 not associated with cancer but previously shown to be involved in fusions in gliomas. After selecting in-frame fusions able to produce a protein product and running Oncofuse, we identified 30 fusions with predictable oncogenic potential and classified them into four non-overlapping categories: six previously described in cancer; six involving an oncogene or TSG; four predicted by Oncofuse to have oncogenic potential; and 14 other in-frame fusions. Only 24 patients harbored one or more of these 30 fusions, and only two fusions were present in more than one patient: FGFR3::TACC3 and EGFR::SEPTIN14. This in silico study provides a good starting point for the identification of gene fusions with functional consequences in the pathogenesis or treatment of glioblastoma.
Collapse
|