1
|
Khadirnaikar S, Shukla S, Prasanna SRM. Integration of pan-cancer multi-omics data for novel mixed subgroup identification using machine learning methods. PLoS One 2023; 18:e0287176. [PMID: 37856446 PMCID: PMC10586677 DOI: 10.1371/journal.pone.0287176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 05/30/2023] [Indexed: 10/21/2023] Open
Abstract
Cancer is a heterogeneous disease, and patients with tumors from different organs can share similar epigenetic and genetic alterations. Therefore, it is crucial to identify the novel subgroups of patients with similar molecular characteristics. It is possible to propose a better treatment strategy when the heterogeneity of the patient is accounted for during subgroup identification, irrespective of the tissue of origin. This work proposes a machine learning (ML) based pipeline for subgroup identification in pan-cancer. Here, mRNA, miRNA, DNA methylation, and protein expression features from pan-cancer samples were concatenated and non-linearly projected to a lower dimension using an ML algorithm. This data was then clustered to identify multi-omics-based novel subgroups. The clinical characterization of these ML subgroups indicated significant differences in overall survival (OS) and disease-free survival (DFS) (p-value<0.0001). The subgroups formed by the patients from different tumors shared similar molecular alterations in terms of immune microenvironment, mutation profile, and enriched pathways. Further, decision-level and feature-level fused classification models were built to identify the novel subgroups for unseen samples. Additionally, the classification models were used to obtain the class labels for the validation samples, and the molecular characteristics were verified. To summarize, this work identified novel ML subgroups using multi-omics data and showed that the patients with different tumor types could be similar molecularly. We also proposed and validated the classification models for subgroup identification. The proposed classification models can be used to identify the novel multi-omics subgroups, and the molecular characteristics of each subgroup can be used to design appropriate treatment regimen.
Collapse
Affiliation(s)
- Seema Khadirnaikar
- Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| | - Sudhanshu Shukla
- Department of Biosciences and Bioengineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| | - S. R. M. Prasanna
- Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
| |
Collapse
|
2
|
Ahmed SBM, Radwan N, Amer S, Saheb Sharif-Askari N, Mahdami A, Samara KA, Halwani R, Jelinek HF. Assessing the Link between Diabetic Metabolic Dysregulation and Breast Cancer Progression. Int J Mol Sci 2023; 24:11816. [PMID: 37511575 PMCID: PMC10380477 DOI: 10.3390/ijms241411816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 07/19/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023] Open
Abstract
Diabetes mellitus is a burdensome disease that affects various cellular functions through altered glucose metabolism. Several reports have linked diabetes to cancer development; however, the exact molecular mechanism of how diabetes-related traits contribute to cancer progression is not fully understood. The current study aimed to explore the molecular mechanism underlying the potential effect of hyperglycemia combined with hyperinsulinemia on the progression of breast cancer cells. To this end, gene dysregulation induced by the exposure of MCF7 breast cancer cells to hyperglycemia (HG), or a combination of hyperglycemia and hyperinsulinemia (HGI), was analyzed using a microarray gene expression assay. Hyperglycemia combined with hyperinsulinemia induced differential expression of 45 genes (greater than or equal to two-fold), which were not shared by other treatments. On the other hand, in silico analysis performed using a publicly available dataset (GEO: GSE150586) revealed differential upregulation of 15 genes in the breast tumor tissues of diabetic patients with breast cancer when compared with breast cancer patients with no diabetes. SLC26A11, ALDH1A3, MED20, PABPC4 and SCP2 were among the top upregulated genes in both microarray data and the in silico analysis. In conclusion, hyperglycemia combined with hyperinsulinemia caused a likely unique signature that contributes to acquiring more carcinogenic traits. Indeed, these findings might potentially add emphasis on how monitoring diabetes-related metabolic alteration as an adjunct to diabetes therapy is important in improving breast cancer outcomes. However, further detailed studies are required to decipher the role of the highlighted genes, in this study, in the pathogenesis of breast cancer in patients with a different glycemic index.
Collapse
Affiliation(s)
- Samrein B M Ahmed
- Research Institute of Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
- College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
- College of Health, Wellbeing and Life Sciences, Department of Biosciences and Chemistry, Sheffield Hallam University, Sheffield S1 1WB, UK
| | - Nada Radwan
- Research Institute of Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Sara Amer
- Research Institute of Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Narjes Saheb Sharif-Askari
- Research Institute of Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
- College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Amena Mahdami
- Research Institute of Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Kamel A Samara
- College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Rabih Halwani
- Research Institute of Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
- College of Medicine, University of Sharjah, Sharjah 27272, United Arab Emirates
| | - Herbert F Jelinek
- Department of Biomedical Engineering and Health Engineering Innovation Center, Khalifa University, Abu Dhabi 127788, United Arab Emirates
| |
Collapse
|
3
|
Tiong KL, Sintupisut N, Lin MC, Cheng CH, Woolston A, Lin CH, Ho M, Lin YW, Padakanti S, Yeang CH. An integrated analysis of the cancer genome atlas data discovers a hierarchical association structure across thirty three cancer types. PLOS DIGITAL HEALTH 2022; 1:e0000151. [PMID: 36812605 PMCID: PMC9931374 DOI: 10.1371/journal.pdig.0000151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 10/31/2022] [Indexed: 06/18/2023]
Abstract
Cancer cells harbor molecular alterations at all levels of information processing. Genomic/epigenomic and transcriptomic alterations are inter-related between genes, within and across cancer types and may affect clinical phenotypes. Despite the abundant prior studies of integrating cancer multi-omics data, none of them organizes these associations in a hierarchical structure and validates the discoveries in extensive external data. We infer this Integrated Hierarchical Association Structure (IHAS) from the complete data of The Cancer Genome Atlas (TCGA) and compile a compendium of cancer multi-omics associations. Intriguingly, diverse alterations on genomes/epigenomes from multiple cancer types impact transcriptions of 18 Gene Groups. Half of them are further reduced to three Meta Gene Groups enriched with (1) immune and inflammatory responses, (2) embryonic development and neurogenesis, (3) cell cycle process and DNA repair. Over 80% of the clinical/molecular phenotypes reported in TCGA are aligned with the combinatorial expressions of Meta Gene Groups, Gene Groups, and other IHAS subunits. Furthermore, IHAS derived from TCGA is validated in more than 300 external datasets including multi-omics measurements and cellular responses upon drug treatments and gene perturbations in tumors, cancer cell lines, and normal tissues. To sum up, IHAS stratifies patients in terms of molecular signatures of its subunits, selects targeted genes or drugs for precision cancer therapy, and demonstrates that associations between survival times and transcriptional biomarkers may vary with cancer types. These rich information is critical for diagnosis and treatments of cancers.
Collapse
Affiliation(s)
- Khong-Loon Tiong
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Nardnisa Sintupisut
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Min-Chin Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- Psomagen, Rockville, Maryland, United States of America
| | - Chih-Hung Cheng
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Andrew Woolston
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- Translational Cancer Immunotherapy & Genomics Lab, Barts Cancer Institute, Charterhouse Square, London, United Kingdom
| | - Chih-Hsu Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- C3.ai, Redwood City, California, United States of America
| | - Mirrian Ho
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Yu-Wei Lin
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
- AiLife Diagnostics, Pearland, Texas, United States of America
| | - Sridevi Padakanti
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Section 2, Taipei, Taiwan
| |
Collapse
|
4
|
Cancer stem cell markers interplay with chemoresistance in triple negative breast cancer: A therapeutic perspective. Bull Cancer 2022; 109:960-971. [DOI: 10.1016/j.bulcan.2022.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/18/2022] [Accepted: 05/03/2022] [Indexed: 11/19/2022]
|
5
|
Gonzalez-Reymundez A, Grueneberg A, Lu G, Alves FC, Rincon G, Vazquez AI. MOSS: multi-omic integration with sparse value decomposition. Bioinformatics 2022; 38:2956-2958. [PMID: 35561193 PMCID: PMC9113319 DOI: 10.1093/bioinformatics/btac179] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 03/07/2022] [Accepted: 03/23/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY This article presents multi-omic integration with sparse value decomposition (MOSS), a free and open-source R package for integration and feature selection in multiple large omics datasets. This package is computationally efficient and offers biological insight through capabilities, such as cluster analysis and identification of informative omic features. AVAILABILITY AND IMPLEMENTATION https://CRAN.R-project.org/package=MOSS. SUPPLEMENTARY INFORMATION Supplementary information can be found at https://github.com/agugonrey/GonzalezReymundez2021.
Collapse
Affiliation(s)
| | - Alexander Grueneberg
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Guanqi Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Filipe Couto Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| | - Gonzalo Rincon
- Genus PLC Inc., Genome Sciences R&D, De Forest, WI 53532, USA
| | - Ana I Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
6
|
Functional stratification of cancer drugs through integrated network similarity. NPJ Syst Biol Appl 2022; 8:11. [PMID: 35440787 PMCID: PMC9018743 DOI: 10.1038/s41540-022-00219-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 01/21/2022] [Indexed: 11/30/2022] Open
Abstract
Drugs not only perturb their immediate protein targets but also modulate multiple signaling pathways. In this study, we explored networks modulated by several drugs across multiple cancer cell lines by integrating their targets with transcriptomic and phosphoproteomic data. As a result, we obtained 236 reconstructed networks covering five cell lines and 70 drugs. A rigorous topological and pathway analysis showed that chemically and functionally different drugs may modulate overlapping networks. Additionally, we revealed a set of tumor-specific hidden pathways with the help of drug network models that are not detectable from the initial data. The difference in the target selectivity of the drugs leads to disjoint networks despite sharing a similar mechanism of action, e.g., HDAC inhibitors. We also used the reconstructed network models to study potential drug combinations based on the topological separation and found literature evidence for a set of drug pairs. Overall, network-level exploration of drug-modulated pathways and their deep comparison may potentially help optimize treatment strategies and suggest new drug combinations.
Collapse
|
7
|
Sukhadia SS, Tyagi A, Venkataraman V, Mukherjee P, Prasad P, Gevaert O, Nagaraj SH. ImaGene: a web-based software platform for tumor radiogenomic evaluation and reporting. BIOINFORMATICS ADVANCES 2022; 2:vbac079. [PMID: 36699376 PMCID: PMC9714320 DOI: 10.1093/bioadv/vbac079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/26/2022] [Accepted: 11/09/2022] [Indexed: 11/12/2022]
Abstract
Summary Radiographic imaging techniques provide insight into the imaging features of tumor regions of interest, while immunohistochemistry and sequencing techniques performed on biopsy samples yield omics data. Relationships between tumor genotype and phenotype can be identified from these data through traditional correlation analyses and artificial intelligence (AI) models. However, the radiogenomics community lacks a unified software platform with which to conduct such analyses in a reproducible manner. To address this gap, we developed ImaGene, a web-based platform that takes tumor omics and imaging datasets as inputs, performs correlation analysis between them, and constructs AI models. ImaGene has several modifiable configuration parameters and produces a report displaying model diagnostics. To demonstrate the utility of ImaGene, we utilized data for invasive breast carcinoma (IBC) and head and neck squamous cell carcinoma (HNSCC) and identified potential associations between imaging features and nine genes (WT1, LGI3, SP7, DSG1, ORM1, CLDN10, CST1, SMTNL2, and SLC22A31) for IBC and eight genes (NR0B1, PLA2G2A, MAL, CLDN16, PRDM14, VRTN, LRRN1, and MECOM) for HNSCC. ImaGene has the potential to become a standard platform for radiogenomic tumor analyses due to its ease of use, flexibility, and reproducibility, playing a central role in the establishment of an emerging radiogenomic knowledge base. Availability and implementation www.ImaGene.pgxguide.org, https://github.com/skr1/Imagene.git. Supplementary information Supplementary data are available at https://github.com/skr1/Imagene.git.
Collapse
Affiliation(s)
- Shrey S Sukhadia
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD 4000, Australia.,Translational Research Institute, Brisbane, QLD 4000, Australia
| | - Aayush Tyagi
- Yardi School of Artificial Intelligence, Indian Institute of Technology, New Delhi 110016, India
| | - Vivek Venkataraman
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD 4000, Australia.,Translational Research Institute, Brisbane, QLD 4000, Australia
| | - Pritam Mukherjee
- Stanford Center for Biomedical Informatics Research, Department of Medicine and Biomedical Data Science, Stanford University, Stanford, CA 94305-5101, USA
| | - Pratosh Prasad
- Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560012, India
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research, Department of Medicine and Biomedical Data Science, Stanford University, Stanford, CA 94305-5101, USA
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, QLD 4000, Australia.,Translational Research Institute, Brisbane, QLD 4000, Australia
| |
Collapse
|
8
|
Prokopidis K, Giannos P, Witard OC, Peckham D, Ispoglou T. Aberrant mitochondrial homeostasis at the crossroad of musculoskeletal ageing and non-small cell lung cancer. PLoS One 2022; 17:e0273766. [PMID: 36067173 PMCID: PMC9447904 DOI: 10.1371/journal.pone.0273766] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 08/12/2022] [Indexed: 11/19/2022] Open
Abstract
Cancer cachexia is accompanied by muscle atrophy, sharing multiple common catabolic pathways with sarcopenia, including mitochondrial dysfunction. This study investigated gene expression from skeletal muscle tissues of older healthy adults, who are at risk of age-related sarcopenia, to identify potential gene biomarkers whose dysregulated expression and protein interference were involved in non-small cell lung cancer (NSCLC). Screening of the literature resulted in 14 microarray datasets (GSE25941, GSE28392, GSE28422, GSE47881, GSE47969, GSE59880 in musculoskeletal ageing; GSE118370, GSE33532, GSE19804, GSE18842, GSE27262, GSE19188, GSE31210, GSE40791 in NSCLC). Differentially expressed genes (DEGs) were used to construct protein-protein interaction networks and retrieve clustering gene modules. Overlapping module DEGs were ranked based on 11 topological algorithms and were correlated with prognosis, tissue expression, and tumour purity in NSCLC. The analysis revealed that the dysregulated expression of the mammalian mitochondrial ribosomal proteins, Mitochondrial Ribosomal Protein S26 (MRPS26), Mitochondrial Ribosomal Protein S17 (MRPS17), Mitochondrial Ribosomal Protein L18 (MRPL18) and Mitochondrial Ribosomal Protein L51 (MRPL51) were linked to reduced survival and tumour purity in NSCLC while tissue expression of the same genes followed an opposite direction in healthy older adults. These results support a potential link between the mitochondrial ribosomal microenvironment in ageing muscle and NSCLC. Further studies comparing changes in sarcopenia and NSCLC associated cachexia are warranted.
Collapse
Affiliation(s)
- Konstantinos Prokopidis
- Society of Meta-Research and Biomedical Innovation, London, United Kingdom
- Department of Musculoskeletal Biology, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Panagiotis Giannos
- Society of Meta-Research and Biomedical Innovation, London, United Kingdom
- Department of Life Sciences, Faculty of Natural Sciences, Imperial College London, London, United Kingdom
- * E-mail:
| | - Oliver C. Witard
- Faculty of Life Sciences and Medicine, Centre for Human and Applied Physiological Sciences, King’s College London, London, United Kingdom
| | - Daniel Peckham
- Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
| | | |
Collapse
|
9
|
Albaradei S, Napolitano F, Thafar MA, Gojobori T, Essack M, Gao X. MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data. Comput Struct Biotechnol J 2021; 19:4404-4411. [PMID: 34429856 PMCID: PMC8368987 DOI: 10.1016/j.csbj.2021.08.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 07/19/2021] [Accepted: 08/06/2021] [Indexed: 02/09/2023] Open
Abstract
Predicting metastasis in the early stages means that clinicians have more time to adjust a treatment regimen to target the primary and metastasized cancer. In this regard, several computational approaches are being developed to identify metastasis early. However, most of the approaches focus on changes on one genomic level only, and they are not being developed from a pan-cancer perspective. Thus, we here present a deep learning (DL)-based model, MetaCancer, that differentiates pan-cancer metastasis status based on three heterogeneous data layers. In particular, we built the DL-based model using 400 patients' data that includes RNA sequencing (RNA-Seq), microRNA sequencing (microRNA-Seq), and DNA methylation data from The Cancer Genome Atlas (TCGA). We quantitatively assess the proposed convolutional variational autoencoder (CVAE) and alternative feature extraction methods. We further show that integrating mRNA, microRNA, and DNA methylation data as features improves our model's performance compared to when we used mRNA data only. In addition, we show that the mRNA-related features make a more significant contribution when attempting to distinguish the primary tumors from metastatic ones computationally. Lastly, we show that our DL model significantly outperformed a machine learning (ML) ensemble method based on various metrics.
Collapse
Affiliation(s)
- Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Francesco Napolitano
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
10
|
Yeasmin F, Imamachi N, Tanu T, Taniue K, Kawamura T, Yada T, Akimitsu N. Identification and analysis of short open reading frames (sORFs) in the initially annotated noncoding RNA LINC00493 from human cells. J Biochem 2021; 169:421-434. [PMID: 33386847 DOI: 10.1093/jb/mvaa143] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 09/23/2020] [Indexed: 12/31/2022] Open
Abstract
Whole transcriptome analyses have revealed that mammalian genomes are massively transcribed, resulting in the production of huge numbers of transcripts with unknown functions (TUFs). Previous research has categorized most TUFs as noncoding RNAs (ncRNAs) because most previously studied TUFs do not encode open reading frames (ORFs) with biologically significant lengths [>100 amino acids (AAs)]. Recent studies, however, have reported that several transcripts harbouring small ORFs that encode peptides shorter than 100 AAs are translated and play important biological functions. Here, we examined the translational capacity of transcripts annotated as ncRNAs in human cells, and identified several hundreds of ribosome-associated transcripts previously annotated as ncRNAs. Ribosome footprinting and polysome profiling analyses revealed that 61 of them are potentially translatable. Among them, 45 were nonnonsense-mediated mRNA decay targets, suggesting that they are productive mRNAs. We confirmed the translation of one ncRNA, LINC00493, by luciferase reporter assaying and western blotting of a FLAG-tagged LINC00493 peptide. While proteomic analysis revealed that the LINC00493 peptide interacts with many mitochondrial proteins, immunofluorescence assays showed that its peptide is mitochondrially localized. Our findings indicate that some transcripts annotated as ncRNAs encode peptides and that unannotated peptides may perform important roles in cells.
Collapse
Affiliation(s)
- Fouzia Yeasmin
- Isotope Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-Ku, Tokyo 113-0032, Japan
| | - Naoto Imamachi
- Isotope Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-Ku, Tokyo 113-0032, Japan
| | - Tanzina Tanu
- Isotope Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-Ku, Tokyo 113-0032, Japan
| | - Kenzui Taniue
- Isotope Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-Ku, Tokyo 113-0032, Japan
| | - Takeshi Kawamura
- Isotope Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-Ku, Tokyo 113-0032, Japan
| | - Tetsushi Yada
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
| | - Nobuyoshi Akimitsu
- Isotope Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-Ku, Tokyo 113-0032, Japan
| |
Collapse
|
11
|
Chen H, Li H, Wang L, Li Y, Yang C. A 5-gene DNA methylation signature is a promising prognostic biomarker for early-stage cervical cancer. J OBSTET GYNAECOL 2021; 42:327-332. [PMID: 34082663 DOI: 10.1080/01443615.2021.1907563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The demographic information and overall survival (OS) of patients with cervical cancer (CC) (pathological stage: IA-IIA) were extracted from the TCGA database. A univariate and multivariate Cox proportional hazard model was performed to identify methylation markers significantly associated with the OS of patients in the training dataset. Then such a prognostic classifier was tested on the validation set and all subgroups. The Kaplan-Meier analysis and ROC analysis were performed to detect the ability to discriminate between patients with different risks and different OS. A DNA methylation signature which contained five genes was found to be significantly associated with the OS of CC patients by the Cox regression analysis in the training dataset. Such a signature could efficiently distinguish the patients into two risk groups with significantly different OS in both datasets. The receiver operating characteristic (ROC) analysis showed it had high sensitivity and specificity. Moreover, such a prognostic model also could be effectively applied to different subgroups, including groups of different ages, tumour sizes, histologic types, etc. A 5-DNA methylation signature identified by this study may act as a novel prognostic indicator for early-stage CC, and it may be helpful for the timely diagnosis and intervention of CC at pathological stages IA-IIA.Impact StatementWhat is already known on this subject? Cervical cancer (CC) is one of the most common gynaecological malignant tumours.What the results of this study add? This study constructed a risk model based on a 5-DNA methylation signature for early-stage CC patients' survival prediction.What the implications are of these findings for clinical practice and/or further research? Methylated markers have the potential to discriminate patients of different risks and different OS. Our results may shed new light on the early diagnosis and intervention, and potential therapeutic targets for CC patients at pathological stages IA-IIA.
Collapse
Affiliation(s)
- Hongxia Chen
- Department of Pathophysiology, School of Basic Medicine, Hubei University of Science and Technology, Xianning, China
| | - Hongying Li
- Maternal and Child Health Hospital of Hubei Province, Hongshan District, Wuhan, Hubei, China.,Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Wuhan University of Science and Technology, Wuhan, Hubei, China
| | - Lei Wang
- School of Laboratory Medicine, Hubei University of Chinese Medicine, Wuhan, China
| | - Yaxiong Li
- Information Center of Hubei University of Science and Technology, Xianning, China
| | - ChunYan Yang
- Department of Public Health Management, School of Basic Medicine, Hubei University of Science and Technology, Xianning, China
| |
Collapse
|
12
|
Hira MT, Razzaque MA, Angione C, Scrivens J, Sawan S, Sarker M. Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Sci Rep 2021; 11:6265. [PMID: 33737557 PMCID: PMC7973750 DOI: 10.1038/s41598-021-85285-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 02/28/2021] [Indexed: 02/06/2023] Open
Abstract
Cancer is a complex disease that deregulates cellular functions at various molecular levels (e.g., DNA, RNA, and proteins). Integrated multi-omics analysis of data from these levels is necessary to understand the aberrant cellular functions accountable for cancer and its development. In recent years, Deep Learning (DL) approaches have become a useful tool in integrated multi-omics analysis of cancer data. However, high dimensional multi-omics data are generally imbalanced with too many molecular features and relatively few patient samples. This imbalance makes a DL based integrated multi-omics analysis difficult. DL-based dimensionality reduction technique, including variational autoencoder (VAE), is a potential solution to balance high dimensional multi-omics data. However, there are few VAE-based integrated multi-omics analyses, and they are limited to pancancer. In this work, we did an integrated multi-omics analysis of ovarian cancer using the compressed features learned through VAE and an improved version of VAE, namely Maximum Mean Discrepancy VAE (MMD-VAE). First, we designed and developed a DL architecture for VAE and MMD-VAE. Then we used the architecture for mono-omics, integrated di-omics and tri-omics data analysis of ovarian cancer through cancer samples identification, molecular subtypes clustering and classification, and survival analysis. The results show that MMD-VAE and VAE-based compressed features can respectively classify the transcriptional subtypes of the TCGA datasets with an accuracy in the range of 93.2-95.5% and 87.1-95.7%. Also, survival analysis results show that VAE and MMD-VAE based compressed representation of omics data can be used in cancer prognosis. Based on the results, we can conclude that (i) VAE and MMD-VAE outperform existing dimensionality reduction techniques, (ii) integrated multi-omics analyses perform better or similar compared to their mono-omics counterparts, and (iii) MMD-VAE performs better than VAE in most omics dataset.
Collapse
Affiliation(s)
- Muta Tah Hira
- School of Health and Life Sciences, Teesside University, Middlesbrough, TS4 3BX, UK
| | - M A Razzaque
- School of Computing, Eng. & Digital Tech., Teesside University, Middlesbrough, TS4 3BX, UK.
| | - Claudio Angione
- School of Computing, Eng. & Digital Tech., Teesside University, Middlesbrough, TS4 3BX, UK
| | - James Scrivens
- School of Health and Life Sciences, Teesside University, Middlesbrough, TS4 3BX, UK
| | - Saladin Sawan
- The James Cook University Hospital, Middlesbrough, TS4 3BW, UK
| | - Mosharraf Sarker
- School of Health and Life Sciences, Teesside University, Middlesbrough, TS4 3BX, UK
| |
Collapse
|
13
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
14
|
Pan-cancer driver copy number alterations identified by joint expression/CNA data analysis. Sci Rep 2020; 10:17199. [PMID: 33057153 PMCID: PMC7566486 DOI: 10.1038/s41598-020-74276-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 09/29/2020] [Indexed: 02/07/2023] Open
Abstract
AbstractAnalysis of large gene expression datasets from biopsies of cancer patients can identify co-expression signatures representing particular biomolecular events in cancer. Some of these signatures involve genomically co-localized genes resulting from the presence of copy number alterations (CNAs), for which analysis of the expression of the underlying genes provides valuable information about their combined role as oncogenes or tumor suppressor genes. Here we focus on the discovery and interpretation of such signatures that are present in multiple cancer types due to driver amplifications and deletions in particular regions of the genome after doing a comprehensive analysis combining both gene expression and CNA data from The Cancer Genome Atlas.
Collapse
|
15
|
Feltes BC, Poloni JDF, Nunes IJG, Faria SS, Dorn M. Multi-Approach Bioinformatics Analysis of Curated Omics Data Provides a Gene Expression Panorama for Multiple Cancer Types. Front Genet 2020; 11:586602. [PMID: 33329726 PMCID: PMC7719697 DOI: 10.3389/fgene.2020.586602] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 10/09/2020] [Indexed: 12/19/2022] Open
Abstract
Studies describing the expression patterns and biomarkers for the tumoral process increase in number every year. The availability of new datasets, although essential, also creates a confusing landscape where common or critical mechanisms are obscured amidst the divergent and heterogeneous nature of such results. In this work, we manually curated the Gene Expression Omnibus using rigorous filtering criteria to select the most homogeneous and highest quality microarray and RNA-seq datasets from multiple types of cancer. By applying systems biology approaches, combined with machine learning analysis, we investigated possible frequently deregulated molecular mechanisms underlying the tumoral process. Our multi-approach analysis of 99 curated datasets, composed of 5,406 samples, revealed 47 differentially expressed genes in all analyzed cancer types, which were all in agreement with the validation using TCGA data. Results suggest that the tumoral process is more related to the overexpression of core deregulated machinery than the underexpression of a given gene set. Additionally, we identified gene expression similarities between different cancer types not described before and performed an overall survival analysis using 20 cancer types. Finally, we were able to suggest a core regulatory mechanism that could be frequently deregulated.
Collapse
Affiliation(s)
- Bruno César Feltes
- Laboratory of Structural Bioinformatics and Computational Biology, Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Joice de Faria Poloni
- Laboratory of Structural Bioinformatics and Computational Biology, Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | | | - Sara Socorro Faria
- Laboratory of Immunology and Inflammation, Department of Cell Biology, University of Brasilia, Brasilia, Brazil
| | - Marcio Dorn
- Laboratory of Structural Bioinformatics and Computational Biology, Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
- Center of Biotechnology, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
- National Institute of Science and Technology - Forensic Science, Porto Alegre, Brazil
| |
Collapse
|