1
|
Musilova J, Vafek Z, Puniya BL, Zimmer R, Helikar T, Sedlar K. Augusta: From RNA-Seq to gene regulatory networks and Boolean models. Comput Struct Biotechnol J 2024; 23:783-790. [PMID: 38312198 PMCID: PMC10837063 DOI: 10.1016/j.csbj.2024.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/06/2024] Open
Abstract
Computational models of gene regulations help to understand regulatory mechanisms and are extensively used in a wide range of areas, e.g., biotechnology or medicine, with significant benefits. Unfortunately, there are only a few computational gene regulatory models of whole genomes allowing static and dynamic analysis due to the lack of sophisticated tools for their reconstruction. Here, we describe Augusta, an open-source Python package for Gene Regulatory Network (GRN) and Boolean Network (BN) inference from the high-throughput gene expression data. Augusta can reconstruct genome-wide models suitable for static and dynamic analyses. Augusta uses a unique approach where the first estimation of a GRN inferred from expression data is further refined by predicting transcription factor binding motifs in promoters of regulated genes and by incorporating verified interactions obtained from databases. Moreover, a refined GRN is transformed into a draft BN by searching in the curated model database and setting logical rules to incoming edges of target genes, which can be further manually edited as the model is provided in the SBML file format. The approach is applicable even if information about the organism under study is not available in the databases, which is typically the case for non-model organisms including most microbes. Augusta can be operated from the command line and, thus, is easy to use for automated prediction of models for various genomes. The Augusta package is freely available at github.com/JanaMus/Augusta. Documentation and tutorials are available at augusta.readthedocs.io.
Collapse
Affiliation(s)
- Jana Musilova
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno 61600, Czech Republic
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Zdenek Vafek
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
- Institute of Forensic Engineering, Brno University of Technology, Brno 61200, Czech Republic
| | - Bhanwar Lal Puniya
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Ralf Zimmer
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich 80539, Germany
| | - Tomas Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln 68588, NE, USA
| | - Karel Sedlar
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno 61600, Czech Republic
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich 80539, Germany
| |
Collapse
|
2
|
Nguyen TA, Le MK, Nguyen PT, Tran NQV, Kondo T, Nakao A. SLC22A3 that encodes organic cation transporter-3 is associated with prognosis and immunogenicity of human lung squamous cell carcinoma. Transl Lung Cancer Res 2023; 12:1972-1986. [PMID: 38025816 PMCID: PMC10654437 DOI: 10.21037/tlcr-23-334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/26/2023] [Indexed: 12/01/2023]
Abstract
Background SLC22A3, the gene which encodes organic cation transporter (OCT)-3, has been linked to the prognosis of several types of cancer. However, its role in lung squamous cell carcinoma (LSCC) has not been addressed elsewhere. Methods We analyzed gene expression, DNA methylation, and clinicopathological data from The Cancer Genome Atlas - Lung Squamous Cell Carcinoma (TCGA-LUSC) (n=501), a publicly available database exclusively consisting of LSCC patients. Using a 5 FPKM (fragments per kilobase of exon per million mapped fragments) cut-off, we divided LSCC patients into two groups: patients with tumors possessing high and low SLC22A3 expression (SLC22A3-high and SLC22A3-low, respectively). Prognostic significance was determined through Cox analyses and Kaplan-Meier curves for overall survival (OS) and disease-free survival (DFS). Differential methylation position (DMP), differentially gene expression, and pathway analyses were performed. Validation was carried out in GSE74777 (n=107), GSE37745 (n=66), GSE162520 (n=45) and GSE161537 (n=17). Results SLC22A3-high LSCC patients had lower OS and DFS rates than SLC22A3-low LSCC patients. The different expression levels of SLC22A3 in LSCC were correlated with the methylation status of the SLC22A3 gene. Pathway analysis indicated that SLC22A3 expression levels were positively correlated with immune-related pathways such as inflammatory response and abundance of infiltrating immune cells in the tumor microenvironment (TME). Notably, in the SLC22A3-high group, many genes encoding immunological checkpoint inhibitory molecules were upregulated. In addition, SLC22A3 expression positively correlated with the Hot Oral Tumor (HOT) score, indicating high tumor immunogenicity. Conclusions These findings suggest that high expression of SLC22A3 is associated with poor prognosis and high immunogenicity in LSCC tumors.
Collapse
Affiliation(s)
- Thuy-An Nguyen
- Department of Immunology, Faculty of Medicine, University of Yamanashi, Yamanashi, Japan
| | - Minh-Khang Le
- Department of Human Pathology, University of Yamanashi, Yamanashi, Japan
| | - Phuc-Tan Nguyen
- Department of Immunology, Faculty of Medicine, University of Yamanashi, Yamanashi, Japan
| | - Nguyen Quoc Vuong Tran
- Department of Immunology, Faculty of Medicine, University of Yamanashi, Yamanashi, Japan
| | - Tetsuo Kondo
- Department of Human Pathology, University of Yamanashi, Yamanashi, Japan
| | - Atsuhito Nakao
- Department of Immunology, Faculty of Medicine, University of Yamanashi, Yamanashi, Japan
- Yamanashi GLIA Center, University of Yamanashi, Yamanashi, Japan
- Atopy Research Center, Juntendo University School of Medicine, Tokyo, Japan
| |
Collapse
|
3
|
Marin L, Casado F. Prediction of prostate cancer biochemical recurrence by using discretization supports the critical contribution of the extra-cellular matrix genes. Sci Rep 2023; 13:10144. [PMID: 37349324 PMCID: PMC10287745 DOI: 10.1038/s41598-023-35821-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/24/2023] [Indexed: 06/24/2023] Open
Abstract
Due to its complexity, much effort has been devoted to the development of biomarkers for prostate cancer that have acquired the utmost clinical relevance for diagnosis and grading. However, all of these advances are limited due to the relatively large percentage of biochemical recurrence (BCR) and the limited strategies for follow up. This work proposes a methodology that uses discretization to predict prostate cancer BCR while optimizing the necessary variables. We used discretization of RNA-seq data to increase the prediction of biochemical recurrence and retrieve a subset of ten genes functionally known to be related to the tissue structure. Equal width and equal frequency data discretization methods were compared to isolate the contribution of the genes and their interval of action, simultaneously. Adding a robust clinical biomarker such as prostate specific antigen (PSA) improved the prediction of BCR. Discretization allowed classifying the cancer patients with an accuracy of 82% on testing datasets, and 75% on a validation dataset when a five-bin discretization by equal width was used. After data pre-processing, feature selection and classification, our predictions had a precision of 71% (testing dataset: MSKCC and GSE54460) and 69% (Validation dataset: GSE70769) should the patients present BCR up to 24 months after their final treatment. These results emphasize the use of equal width discretization as a pre-processing step to improve classification for a limited number of genes in the signature. Functionally, many of these genes have a direct or expected role in tissue structure and extracellular matrix organization. The processing steps presented in this study are also applicable to other cancer types to increase the speed and accuracy of the models in diverse datasets.
Collapse
Affiliation(s)
- Laura Marin
- Department of Engineering, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, San Miguel, 15088, Lima, Peru
- Institute of Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, San Miguel, 15088, Lima, Peru
| | - Fanny Casado
- Institute of Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Av. Universitaria 1801, San Miguel, 15088, Lima, Peru.
| |
Collapse
|
4
|
GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data. Genes (Basel) 2023; 14:genes14020269. [PMID: 36833196 PMCID: PMC9957546 DOI: 10.3390/genes14020269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/09/2023] [Accepted: 01/18/2023] [Indexed: 01/22/2023] Open
Abstract
Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze a dataset, users should be able to test different techniques and choose the most appropriate one. This step can be particularly difficult and time consuming, since most methods' implementations are made available independently, possibly in different programming languages. The implementation of an open-source library containing different inference methods within a common framework is expected to be a valuable toolkit for the systems biology community. Results: In this work, we introduce GReNaDIne (Gene Regulatory Network Data-driven Inference), a Python package that implements 18 machine learning data-driven gene regulatory network inference methods. It also includes eight generalist preprocessing techniques, suitable for both RNA-seq and microarray dataset analysis, as well as four normalization techniques dedicated to RNA-seq. In addition, this package implements the possibility to combine the results of different inference tools to form robust and efficient ensembles. This package has been successfully assessed under the DREAM5 challenge benchmark dataset. The open-source GReNaDIne Python package is made freely available in a dedicated GitLab repository, as well as in the official third-party software repository PyPI Python Package Index. The latest documentation on the GReNaDIne library is also available at Read the Docs, an open-source software documentation hosting platform. Contribution: The GReNaDIne tool represents a technological contribution to the field of systems biology. This package can be used to infer gene regulatory networks from high-throughput gene expression data using different algorithms within the same framework. In order to analyze their datasets, users can apply a battery of preprocessing and postprocessing tools and choose the most adapted inference method from the GReNaDIne library and even combine the output of different methods to obtain more robust results. The results format provided by GReNaDIne is compatible with well-known complementary refinement tools such as PYSCENIC.
Collapse
|
5
|
Mansouri M, Khakabimamaghani S, Chindelevitch L, Ester M. Aristotle: stratified causal discovery for omics data. BMC Bioinformatics 2022; 23:42. [PMID: 35033007 PMCID: PMC8760642 DOI: 10.1186/s12859-021-04521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 12/08/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms. Existing causal analysis methods yield promising results when identifying potential general causes of an observed outcome based on omics data. However, they may fail to discover the causes specific to a particular stratum of individuals and missing from others. METHODS To fill this gap, we introduce the problem of stratified causal discovery and propose a method, Aristotle, for solving it. Aristotle addresses the two challenges intrinsic to omics data: high dimensionality and hidden stratification. It employs existing biological knowledge and a state-of-the-art patient stratification method to tackle the above challenges and applies a quasi-experimental design method to each stratum to find stratum-specific potential causes. RESULTS Evaluation based on synthetic data shows better performance for Aristotle in discovering true causes under different conditions compared to existing causal discovery methods. Experiments on a real dataset on Anthracycline Cardiotoxicity indicate that Aristotle's predictions are consistent with the existing literature. Moreover, Aristotle makes additional predictions that suggest further investigations.
Collapse
Affiliation(s)
- Mehrdad Mansouri
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Sahand Khakabimamaghani
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Leonid Chindelevitch
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| | - Martin Ester
- grid.61971.380000 0004 1936 7494School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, CA USA
| |
Collapse
|
6
|
Feng C, Xiang T, Yi Z, Zhao L, He S, Tian K. An Ensemble Model for Tumor Type Identification and Cancer Origins Classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1660-1665. [PMID: 34891604 DOI: 10.1109/embc46164.2021.9629691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Tissue biopsy can be wildly used in cancer diagnosis. However, manually classifying the cancerous status of biopsies and tissue origin of tumors for cancerous ones requires skilled specialists and sophisticated equipment. As a result, a data-based model is urgently needed. In this paper, we propose a data-based ensemble model for tumor type identification and cancer origins classification. Our model is an ensemble model that combines different models based on mRNA groups which serve distinct functions. The experiment on the TCGA dataset exhibits a promising result on both tasks - 98% on tumor type identification and 96.1% on cancer origin classification. We also test our model on external validation datasets, which prove the robustness of our model.
Collapse
|
7
|
Pentrakan A, Yang CC, Wong WK. How Well Does a Sequential Minimal Optimization Model Perform in Predicting Medicine Prices for Procurement System? INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18115523. [PMID: 34063965 PMCID: PMC8196718 DOI: 10.3390/ijerph18115523] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 05/14/2021] [Accepted: 05/18/2021] [Indexed: 12/24/2022]
Abstract
The lack of an efficient approach in managing pharmaceutical prices in the procurement system led to a substantial burden on government budgets. In Thailand, although the reference price policy was implemented to contain the drug expenditure, there have been some challenges with the price dispersion of medicines and pricing information transparency. This phenomenon calls for the development of a potential algorithm to estimate appropriate prices for medical products. To serve this purpose, in this paper, we first developed the model by the sequential minimal optimization (SMO) algorithm for predicting the range of the prices for each medicine, using the Waikato environment for knowledge analysis software, and applying feature selection techniques also to examine improving predictive accuracy. We used the dataset comprised of 2424 records listed on the procurement system in Thailand from January to March 2019 in the application and used a 10-fold cross-validation test to validate the model. The results demonstrated that the model derived by the SMO algorithm with the gain ratio selection method provided good performance at an accuracy of approximately 92.62%, with high sensitivity and precision. Additionally, we found that the model can distinguish the differences in the prices of medicines in the pharmaceutical market by using eight major features—the segmented buyers, the generic product groups, trade product names, procurement methods, dosage forms, pack sizes, manufacturers, and total purchase budgets—that provided the highest predictive accuracy. Our findings are useful to health policymakers who could employ our proposed model in monitoring the situation of medicine prices and providing feedback directly to suggest the best possible price for hospital purchasing managers based on the feature inputs in their procurement system.
Collapse
Affiliation(s)
- Amarawan Pentrakan
- Department of Healthcare Administration, Asia University, Taichung 41354, Taiwan; (A.P.); (C.-C.Y.)
- Department of Pharmacy Administration, Faculty of Pharmaceutical Sciences, Prince of Songkla University, Songkhla 90112, Thailand
| | - Cheng-Chia Yang
- Department of Healthcare Administration, Asia University, Taichung 41354, Taiwan; (A.P.); (C.-C.Y.)
| | - Wing-Keung Wong
- Fintech Center, and Big Data Research Center, Department of Finance, Asia University, Taichung 41354, Taiwan
- Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Economics and Finance, The Hang Seng University of Hong Kong, Hong Kong 999077, Hong Kong
- Correspondence:
| |
Collapse
|
8
|
Del Giudice M, Peirone S, Perrone S, Priante F, Varese F, Tirtei E, Fagioli F, Cereda M. Artificial Intelligence in Bulk and Single-Cell RNA-Sequencing Data to Foster Precision Oncology. Int J Mol Sci 2021; 22:ijms22094563. [PMID: 33925407 PMCID: PMC8123853 DOI: 10.3390/ijms22094563] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/21/2021] [Accepted: 04/23/2021] [Indexed: 02/01/2023] Open
Abstract
Artificial intelligence, or the discipline of developing computational algorithms able to perform tasks that requires human intelligence, offers the opportunity to improve our idea and delivery of precision medicine. Here, we provide an overview of artificial intelligence approaches for the analysis of large-scale RNA-sequencing datasets in cancer. We present the major solutions to disentangle inter- and intra-tumor heterogeneity of transcriptome profiles for an effective improvement of patient management. We outline the contributions of learning algorithms to the needs of cancer genomics, from identifying rare cancer subtypes to personalizing therapeutic treatments.
Collapse
Affiliation(s)
- Marco Del Giudice
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Candiolo Cancer Institute, FPO—IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy
| | - Serena Peirone
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics and INFN, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Sarah Perrone
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Francesca Priante
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Physics, Università degli Studi di Torino, via P.Giuria 1, 10125 Turin, Italy
| | - Fabiola Varese
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Department of Life Science and System Biology, Università degli Studi di Torino, via Accademia Albertina 13, 10123 Turin, Italy
| | - Elisa Tirtei
- Paediatric Onco-Haematology Division, Regina Margherita Children’s Hospital, City of Health and Science of Turin, 10126 Turin, Italy; (E.T.); (F.F.)
| | - Franca Fagioli
- Paediatric Onco-Haematology Division, Regina Margherita Children’s Hospital, City of Health and Science of Turin, 10126 Turin, Italy; (E.T.); (F.F.)
- Department of Public Health and Paediatric Sciences, University of Torino, 10124 Turin, Italy
| | - Matteo Cereda
- Cancer Genomics and Bioinformatics Unit, IIGM—Italian Institute for Genomic Medicine, c/o IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy; (M.D.G.); (S.P.); (S.P.); (F.P.); (F.V.)
- Candiolo Cancer Institute, FPO—IRCCS, Str. Prov.le 142, km 3.95, 10060 Candiolo, TO, Italy
- Correspondence: ; Tel.: +39-011-993-3969
| |
Collapse
|
9
|
Bird JJ, Barnes CM, Premebida C, Ekárt A, Faria DR. Country-level pandemic risk and preparedness classification based on COVID-19 data: A machine learning approach. PLoS One 2020; 15:e0241332. [PMID: 33112931 PMCID: PMC7592809 DOI: 10.1371/journal.pone.0241332] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 10/13/2020] [Indexed: 12/23/2022] Open
Abstract
In this work we present a three-stage Machine Learning strategy to country-level risk classification based on countries that are reporting COVID-19 information. A K% binning discretisation (K = 25) is used to create four risk groups of countries based on the risk of transmission (coronavirus cases per million population), risk of mortality (coronavirus deaths per million population), and risk of inability to test (coronavirus tests per million population). The four risk groups produced by K% binning are labelled as 'low', 'medium-low', 'medium-high', and 'high'. Coronavirus-related data are then removed and the attributes for prediction of the three types of risk are given as the geopolitical and demographic data describing each country. Thus, the calculation of class label is based on coronavirus data but the input attributes are country-level information regardless of coronavirus data. The three four-class classification problems are then explored and benchmarked through leave-one-country-out cross validation to find the strongest model, producing a Stack of Gradient Boosting and Decision Tree algorithms for risk of transmission, a Stack of Support Vector Machine and Extra Trees for risk of mortality, and a Gradient Boosting algorithm for the risk of inability to test. It is noted that high risk for inability to test is often coupled with low risks for transmission and mortality, therefore the risk of inability to test should be interpreted first, before consideration is given to the predicted transmission and mortality risks. Finally, the approach is applied to more recent risk levels to data from September 2020 and weaker results are noted due to the growth of international collaboration detracting useful knowledge from country-level attributes which suggests that similar machine learning approaches are more useful prior to situations later unfolding.
Collapse
Affiliation(s)
- Jordan J. Bird
- Aston Robotics, Vision, and Intelligent Systems Lab (ARVIS), School of Engineering and Applied Science, Aston University, Birmingham, United Kingdom
| | - Chloe M. Barnes
- Aston Robotics, Vision, and Intelligent Systems Lab (ARVIS), School of Engineering and Applied Science, Aston University, Birmingham, United Kingdom
| | - Cristiano Premebida
- Department of Electrical and Computer Engineering, Institute of Systems and Robotics, University of Coimbra, Coimbra, Portugal
| | - Anikó Ekárt
- Aston Robotics, Vision, and Intelligent Systems Lab (ARVIS), School of Engineering and Applied Science, Aston University, Birmingham, United Kingdom
| | - Diego R. Faria
- Aston Robotics, Vision, and Intelligent Systems Lab (ARVIS), School of Engineering and Applied Science, Aston University, Birmingham, United Kingdom
| |
Collapse
|
10
|
Shilpi A, Kandpal M, Ji Y, Seagle BL, Shahabi S, Davuluri RV. Platform-Independent Classification System to Predict Molecular Subtypes of High-Grade Serous Ovarian Carcinoma. JCO Clin Cancer Inform 2020; 3:1-9. [PMID: 31002564 PMCID: PMC6873993 DOI: 10.1200/cci.18.00096] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
PURPOSE Molecular cancer subtyping is an important tool in predicting prognosis and developing novel precision medicine approaches. We developed a novel platform-independent gene expression-based classification system for molecular subtyping of patients with high-grade serous ovarian carcinoma (HGSOC). METHODS Unprocessed exon array (569 tumor and nine normal) and RNA sequencing (RNA-seq; 376 tumor) HGSOC data sets, with clinical annotations, were downloaded from the Genomic Data Commons portal. Sample clustering was performed by non-negative matrix factorization by using isoform-level expression estimates. The association between the subtypes and overall survival was evaluated by Cox proportional hazards regression model after adjusting for the covariates. A novel classification system was developed for HGSOC molecular subtyping. Robustness and generalizability of the gene signatures were validated using independent microarray and RNA-seq data sets. RESULTS Sample clustering recaptured the four known The Cancer Genome Atlas molecular subtypes but switched the subtype for 22% of the cases, which resulted in significant (P = .006) survival differences among the refined subgroups. After adjusting for covariate effects, the mesenchymal subgroup was found to be at an increased hazard for death compared with the immunoreactive subgroup. Both gene- and isoform-level signatures achieved more than 92% prediction accuracy when tested on independent samples profiled on the exon array platform. When the classifier was applied to RNA-seq data, the subtyping calls agreed with the predictions made from exon array data for 95% of the 279 samples profiled by both platforms. CONCLUSION Isoform-level expression analysis successfully stratifies patients with HGSOC into groups with differing prognosis and has led to the development of robust, platform-independent gene signatures for HGSOC molecular subtyping. The association of the refined The Cancer Genome Atlas HGSOC subtypes with overall survival, independent of covariates, enhances the clinical annotation of the HGSOC cohort.
Collapse
|
11
|
Khakabimamaghani S, Kelkar YD, Grande BM, Morin RD, Ester M, Ziemek D. SUBSTRA: Supervised Bayesian Patient Stratification. Bioinformatics 2019; 35:3263-3272. [DOI: 10.1093/bioinformatics/btz112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 01/11/2019] [Accepted: 02/13/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Patient stratification methods are key to the vision of precision medicine. Here, we consider transcriptional data to segment the patient population into subsets relevant to a given phenotype. Whereas most existing patient stratification methods focus either on predictive performance or interpretable features, we developed a method striking a balance between these two important goals.
Results
We introduce a Bayesian method called SUBSTRA that uses regularized biclustering to identify patient subtypes and interpretable subtype-specific transcript clusters. The method iteratively re-weights feature importance to optimize phenotype prediction performance by producing more phenotype-relevant patient subtypes. We investigate the performance of SUBSTRA in finding relevant features using simulated data and successfully benchmark it against state-of-the-art unsupervised stratification methods and supervised alternatives. Moreover, SUBSTRA achieves predictive performance competitive with the supervised benchmark methods and provides interpretable transcriptional features in diverse biological settings, such as drug response prediction, cancer diagnosis, or kidney transplant rejection.
Availability and implementation
The R code of SUBSTRA is available at https://github.com/sahandk/SUBSTRA.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Yogeshwar D Kelkar
- Computational Systems Immunology, Pfizer Worldwide R&D, Cambridge, MA, USA
| | - Bruno M Grande
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Ryan D Morin
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Daniel Ziemek
- Computational Systems Immunology, Pfizer Worldwide R&D, Berlin, Germany
| |
Collapse
|
12
|
Gargallo-Puyuelo CJ, Lanas Á, Asunción García-Gonzalez M. Adding genetic scores to risk models in colorectal cancer. Oncotarget 2019; 10:4803-4804. [PMID: 31448048 PMCID: PMC6690674 DOI: 10.18632/oncotarget.27110] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 07/08/2019] [Indexed: 11/25/2022] Open
Affiliation(s)
- Carla J Gargallo-Puyuelo
- Department of Gastroenterology, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain; Aragón Health Research Institute, Zaragoza, Spain
| | - Ángel Lanas
- Department of Gastroenterology, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain; Aragón Health Research Institute, Zaragoza, Spain
| | - María Asunción García-Gonzalez
- Department of Gastroenterology, Hospital Clínico Universitario Lozano Blesa, Zaragoza, Spain; Aragón Health Research Institute, Zaragoza, Spain
| |
Collapse
|