1
|
Agapito G, Milano M, Cannataro M. A Python Clustering Analysis Protocol of Genes Expression Data Sets. Genes (Basel) 2022; 13:genes13101839. [PMID: 36292724 PMCID: PMC9601308 DOI: 10.3390/genes13101839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/05/2022] [Accepted: 10/08/2022] [Indexed: 11/16/2022] Open
Abstract
Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.
Collapse
Affiliation(s)
- Giuseppe Agapito
- Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Correspondence:
| | - Marianna Milano
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
| | - Mario Cannataro
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
| |
Collapse
|
2
|
Abstract
Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance. The relative abundance of genes pertinent to specific pathways is measured through statistical methods, and associated functional pathways are retrieved from online bioinformatics databases. In the last decade, along with the spread of the internet, higher availability of computational resources made PEA software tools easy to access and to use for bioinformatics practitioners worldwide. Although it became easier to use these tools, it also became easier to make mistakes that could generate inflated or misleading results, especially for beginners and inexperienced computational biologists. With this article, we propose nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results. We describe our nine guidelines in a simple way, so that they can be understood and used by anyone, including students and beginners. Some tips explain what to do before starting a PEA, others are suggestions of how to correctly generate meaningful results, and some final guidelines indicate some useful steps to properly interpret PEA results. Our nine tips can help users perform better pathway enrichment analyses and eventually contribute to a better understanding of current biology.
Collapse
|
3
|
Challenges and Limitations of Biological Network Analysis. BIOTECH 2022; 11:biotech11030024. [PMID: 35892929 PMCID: PMC9326688 DOI: 10.3390/biotech11030024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 11/17/2022] Open
Abstract
High-Throughput technologies are producing an increasing volume of data that needs large amounts of data storage, effective data models and efficient, possibly parallel analysis algorithms. Pathway and interactomics data are represented as graphs and add a new dimension of analysis, allowing, among other features, graph-based comparison of organisms’ properties. For instance, in biological pathway representation, the nodes can represent proteins, RNA and fat molecules, while the edges represent the interaction between molecules. Otherwise, biological networks such as Protein–Protein Interaction (PPI) Networks, represent the biochemical interactions among proteins by using nodes that model the proteins from a given organism, and edges that model the protein–protein interactions, whereas pathway networks enable the representation of biochemical-reaction cascades that happen within the cells or tissues. In this paper, we discuss the main models for standard representation of pathways and PPI networks, the data models for the representation and exchange of pathway and protein interaction data, the main databases in which they are stored and the alignment algorithms for the comparison of pathways and PPI networks of different organisms. Finally, we discuss the challenges and the limitations of pathways and PPI network representation and analysis. We have identified that network alignment presents a lot of open problems worthy of further investigation, especially concerning pathway alignment.
Collapse
|
4
|
Integrative transcriptome analysis identifies deregulated microRNA-transcription factor networks in lung adenocarcinoma. Oncotarget 2018; 7:28920-34. [PMID: 27081085 PMCID: PMC5045367 DOI: 10.18632/oncotarget.8713] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 03/28/2016] [Indexed: 01/07/2023] Open
Abstract
Herein, we aimed at identifying global transcriptome microRNA (miRNA) changes and miRNA target genes in lung adenocarcinoma. Samples were selected as training (N = 24) and independent validation (N = 34) sets. Tissues were microdissected to obtain >90% tumor or normal lung cells, subjected to miRNA transcriptome sequencing and TaqMan quantitative PCR validation. We further integrated our data with published miRNA and mRNA expression datasets across 1,491 lung adenocarcinoma and 455 normal lung samples. We identified known and novel, significantly over- and under-expressed (p ≤ 0.01 and FDR≤0.1) miRNAs in lung adenocarcinoma compared to normal lung tissue: let-7a, miR-10a, miR-15b, miR-23b, miR-26a, miR-26b, miR-29a, miR-30e, miR-99a, miR-146b, miR-181b, miR-181c, miR-421, miR-181a, miR-574 and miR-1247. Validated miRNAs included let-7a-2, let-7a-3, miR-15b, miR-21, miR-155 and miR-200b; higher levels of miR-21 expression were associated with lower patient survival (p = 0.042). We identified a regulatory network including miR-15b and miR-155, and transcription factors with prognostic value in lung cancer. Our findings may contribute to the development of treatment strategies in lung adenocarcinoma.
Collapse
|
5
|
Citron F, Armenia J, Franchin G, Polesel J, Talamini R, D'Andrea S, Sulfaro S, Croce CM, Klement W, Otasek D, Pastrello C, Tokar T, Jurisica I, French D, Bomben R, Vaccher E, Serraino D, Belletti B, Vecchione A, Barzan L, Baldassarre G. An Integrated Approach Identifies Mediators of Local Recurrence in Head and Neck Squamous Carcinoma. Clin Cancer Res 2017; 23:3769-3780. [PMID: 28174235 PMCID: PMC7309652 DOI: 10.1158/1078-0432.ccr-16-2814] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Revised: 12/05/2016] [Accepted: 01/24/2017] [Indexed: 01/06/2023]
Abstract
Purpose: Head and neck squamous cell carcinomas (HNSCCs) cause more than 300,000 deaths worldwide each year. Locoregional and distant recurrences represent worse prognostic events and accepted surrogate markers of patients' overall survival. No valid biomarker and salvage therapy exist to identify and treat patients at high-risk of recurrence. We aimed to verify if selected miRNAs could be used as biomarkers of recurrence in HNSCC.Experimental Design: A NanoString array was used to identify miRNAs associated with locoregional recurrence in 44 patients with HNSCC. Bioinformatic approaches validated the signature and identified potential miRNA targets. Validation experiments were performed using an independent cohort of primary HNSCC samples and a panel of HNSCC cell lines. In vivo experiments validated the in vitro results.Results: Our data identified a four-miRNA signature that classified HNSCC patients at high- or low-risk of recurrence. These miRNAs collectively impinge on the epithelial-mesenchymal transition process. In silico and wet lab approaches showed that miR-9, expressed at high levels in recurrent HNSCC, targets SASH1 and KRT13, whereas miR-1, miR-133, and miR-150, expressed at low levels in recurrent HNSCC, collectively target SP1 and TGFβ pathways. A six-gene signature comprising these targets identified patients at high risk of recurrences, as well. Combined pharmacological inhibition of SP1 and TGFβ pathways induced HNSCC cell death and, when timely administered, prevented recurrence formation in a preclinical model of HNSCC recurrence.Conclusions: By integrating different experimental approaches and competences, we identified critical mediators of recurrence formation in HNSCC that may merit to be considered for future clinical development. Clin Cancer Res; 23(14); 3769-80. ©2017 AACR.
Collapse
Affiliation(s)
- Francesca Citron
- Division of Molecular Oncology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Joshua Armenia
- Division of Molecular Oncology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Giovanni Franchin
- Oncologic Radiotherapy, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Jerry Polesel
- Cancer Epidemiology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Renato Talamini
- Cancer Epidemiology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Sara D'Andrea
- Division of Molecular Oncology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Sandro Sulfaro
- Division of Pathology, Azienda Ospedaliera Santa Maria degli Angeli, Pordenone, Italy
| | - Carlo M Croce
- Department of Cancer Biology and Genetics/CCC, The Ohio State University, Columbus, Ohio
| | - William Klement
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - David Otasek
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Chiara Pastrello
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Tomas Tokar
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Igor Jurisica
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Departments of Medical Biophysics and Computer Science, University of Toronto, Canada
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Deborah French
- Faculty of Medicine and Psicology, Department of Clinical and molecular Medicine, University of Rome "La Sapienza," Santo Andrea Hospital, Rome, Italy
| | - Riccardo Bomben
- Clinical and Experimental Onco-Hematology Unit, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Emanuela Vaccher
- Medical Oncology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Diego Serraino
- Cancer Epidemiology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Barbara Belletti
- Division of Molecular Oncology, CRO Aviano, National Cancer Institute, Aviano, Italy
| | - Andrea Vecchione
- Department of Cancer Biology and Genetics/CCC, The Ohio State University, Columbus, Ohio.
- Faculty of Medicine and Psicology, Department of Clinical and molecular Medicine, University of Rome "La Sapienza," Santo Andrea Hospital, Rome, Italy
| | - Luigi Barzan
- Department of Surgery, CRO Aviano, National Cancer Institute, Aviano, Italy.
| | - Gustavo Baldassarre
- Division of Molecular Oncology, CRO Aviano, National Cancer Institute, Aviano, Italy.
| |
Collapse
|
6
|
STAT3 pathway regulates lung-derived brain metastasis initiating cell capacity through miR-21 activation. Oncotarget 2016; 6:27461-77. [PMID: 26314961 PMCID: PMC4695002 DOI: 10.18632/oncotarget.4742] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 07/13/2015] [Indexed: 12/23/2022] Open
Abstract
Brain metastases (BM) represent the most common tumor to affect the adult central nervous system. Despite the increasing incidence of BM, likely due to consistently improving treatment of primary cancers, BM remain severely understudied. In this study, we utilized patient-derived stem cell lines from lung-to-brain metastases to examine the regulatory role of STAT3 in brain metastasis initiating cells (BMICs). Annotation of our previously described BMIC regulatory genes with protein-protein interaction network mapping identified STAT3 as a novel protein interactor. STAT3 knockdown showed a reduction in BMIC self-renewal and migration, and decreased tumor size in vivo. Screening of BMIC lines with a library of STAT3 inhibitors identified one inhibitor to significantly reduce tumor formation. Meta-analysis identified the oncomir microRNA-21 (miR-21) as a target of STAT3 activity. Inhibition of miR-21 displayed similar reductions in BMIC self-renewal and migration as STAT3 knockdown. Knockdown of STAT3 also reduced expression of known downstream targets of miR-21. Our studies have thus identified STAT3 and miR-21 as cooperative regulators of stemness, migration and tumor initiation in lung-derived BM. Therefore, STAT3 represents a potential therapeutic target in the treatment of lung-to-brain metastases.
Collapse
|
7
|
Inhibition of the hexosamine biosynthetic pathway promotes castration-resistant prostate cancer. Nat Commun 2016; 7:11612. [PMID: 27194471 PMCID: PMC4874037 DOI: 10.1038/ncomms11612] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 04/13/2016] [Indexed: 01/22/2023] Open
Abstract
The precise molecular alterations driving castration-resistant prostate cancer (CRPC) are not clearly understood. Using a novel network-based integrative approach, here, we show distinct alterations in the hexosamine biosynthetic pathway (HBP) to be critical for CRPC. Expression of HBP enzyme glucosamine-phosphate N-acetyltransferase 1 (GNPNAT1) is found to be significantly decreased in CRPC compared with localized prostate cancer (PCa). Genetic loss-of-function of GNPNAT1 in CRPC-like cells increases proliferation and aggressiveness, in vitro and in vivo. This is mediated by either activation of the PI3K-AKT pathway in cells expressing full-length androgen receptor (AR) or by specific protein 1 (SP1)-regulated expression of carbohydrate response element-binding protein (ChREBP) in cells containing AR-V7 variant. Strikingly, addition of the HBP metabolite UDP-N-acetylglucosamine (UDP-GlcNAc) to CRPC-like cells significantly decreases cell proliferation, both in-vitro and in animal studies, while also demonstrates additive efficacy when combined with enzalutamide in-vitro. These observations demonstrate the therapeutic value of targeting HBP in CRPC. The molecular alterations driving anti-androgen resistance in prostate cancer are unclear. Here, the authors show, using a network-based approach, that inhibition of the hexosamine biosynthetic pathway is necessary to develop resistance and that increasing the activity of the pathway enhances the anti-androgen response.
Collapse
|
8
|
Abstract
Tanshinone IIA is a pharmacologically active compound isolated from Danshen (Salvia miltiorrhiza), a traditional Chinese herbal medicine for the management of cardiac diseases and other disorders. But its underlying molecular mechanisms of action are still unclear. The present investigation utilized a data mining approach based on network pharmacology to uncover the potential protein targets of Tanshinone IIA. Network pharmacology, an integrated multidisciplinary study, incorporates systems biology, network analysis, connectivity, redundancy, and pleiotropy, providing powerful new tools and insights into elucidating the fine details of drug-target interactions. In the present study, two separate drug-target networks for Tanshinone IIA were constructed using the Agilent Literature Search (ALS) and STITCH (search tool for interactions of chemicals) methods. Analysis of the ALS-constructed network revealed a target network with a scale-free topology and five top nodes (protein targets) corresponding to Fos, Jun, Src, phosphatidylinositol-4, 5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), and mitogen-activated protein kinase kinase 1 (MAP2K1), whereas analysis of the STITCH-constructed network revealed three top nodes corresponding to cytochrome P450 3A4 (CYP3A4), cytochrome P450 A1 (CYP1A1), and nuclear factor kappa B1 (NFκB1). The discrepancies were probably due to the differences in the divergent computer mining tools and databases employed by the two methods. However, it is conceivable that all eight proteins mediate important biological functions of Tanshinone IIA, contributing to its overall drug-target network. In conclusion, the current results may assist in developing a comprehensive understanding of the molecular mechanisms and signaling pathways of in a simple, compact, and visual manner.
Collapse
Affiliation(s)
- Shao-Jun Chen
- Department of Traditional Chinese Medicine, Zhejiang Pharmaceutical College, Ningbo 315100, China.
| |
Collapse
|
9
|
Lapin V, Shirdel EA, Wei X, Mason JM, Jurisica I, Mak TW. Kinome-wide screening of HER2+ breast cancer cells for molecules that mediate cell proliferation or sensitize cells to trastuzumab therapy. Oncogenesis 2014; 3:e133. [PMID: 25500906 PMCID: PMC4275559 DOI: 10.1038/oncsis.2014.45] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/19/2014] [Indexed: 01/18/2023] Open
Abstract
Understanding the signaling differences that distinguish human HER2-amplified (HER2-positive (HER2+)) breast cancers from other breast cancer subtypes may help to identify protein drug targets for the specific treatment of HER2+ breast cancers. We performed two kinome-wide small interfering RNA (siRNA) screens on five HER2+ breast cancer cell lines, seven breast cancer cell lines in which HER2 was not amplified and two normal breast cell lines. To pinpoint the main kinases driving HER2 signaling, we performed a comprehensive siRNA screen that identified loss of the HER2/HER3 heterodimer as having the most prominent inhibitory effect on the growth of HER2+ breast cancer cells. In a second siRNA screen focused on identifying genes that could sensitize HER2+ cells to trastuzumab treatment, we found that loss of signaling members downstream of phosphatidylinositol 3 kinase (PI3K) potentiated the growth inhibitory effects of trastuzumab. Loss of HER2 and HER3, as well as proteins involved in mitogenic and environmental stress pathways inhibited the proliferation of HER2+ cells only in the absence of trastuzumab, suggesting that these pathways are inhibited by trastuzumab treatment. Loss of essential G2/M cell cycle mediators or proteins involved in vesicle organization exerted inhibitory effects on HER2+ cell growth that were unaffected by trastuzumab. Furthermore, the use of a sensitization index (SI) identified targeting the PI3K pathway to sensitize to trastuzumab treatment. Antagonism using the SI identified MYO3A, MYO3B and MPZL1 as antagonizers to trastuzumab treatment among HER2+ cell lines. Our results suggest that the dimerization partners of HER2 are important for determining the activation of downstream proliferation pathways. Understanding the complex layers of signaling triggered downstream of HER2 homodimers and heterodimers will facilitate the selection of better targets for combination therapies intended to treat HER2+ breast cancers.
Collapse
Affiliation(s)
- V Lapin
- 1] Campbell Family Institute for Breast Cancer Research, Toronto, Ontario, Canada [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - E A Shirdel
- 1] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada [2] Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - X Wei
- Campbell Family Institute for Breast Cancer Research, Toronto, Ontario, Canada
| | - J M Mason
- Campbell Family Institute for Breast Cancer Research, Toronto, Ontario, Canada
| | - I Jurisica
- 1] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada [2] Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - T W Mak
- 1] Campbell Family Institute for Breast Cancer Research, Toronto, Ontario, Canada [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
10
|
Pastrello C, Pasini E, Kotlyar M, Otasek D, Wong S, Sangrar W, Rahmati S, Jurisica I. Integration, visualization and analysis of human interactome. Biochem Biophys Res Commun 2014; 445:757-73. [DOI: 10.1016/j.bbrc.2014.01.151] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 01/24/2014] [Indexed: 02/06/2023]
|
11
|
Holzinger A, Jurisica I. Knowledge Discovery and Data Mining in Biomedical Informatics: The Future Is in Integrative, Interactive Machine Learning Solutions. INTERACTIVE KNOWLEDGE DISCOVERY AND DATA MINING IN BIOMEDICAL INFORMATICS 2014. [DOI: 10.1007/978-3-662-43968-5_1] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
12
|
Otasek D, Pastrello C, Holzinger A, Jurisica I. Visual Data Mining: Effective Exploration of the Biological Universe. INTERACTIVE KNOWLEDGE DISCOVERY AND DATA MINING IN BIOMEDICAL INFORMATICS 2014. [DOI: 10.1007/978-3-662-43968-5_2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
13
|
Reconstructing biological gene regulatory networks: where optimization meets big data. EVOLUTIONARY INTELLIGENCE 2013. [DOI: 10.1007/s12065-013-0098-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|