1
|
Rahimi A, Gonen M. Efficient Multitask Multiple Kernel Learning With Application to Cancer Research. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:8716-8728. [PMID: 33705328 DOI: 10.1109/tcyb.2021.3052357] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Multitask multiple kernel learning (MKL) algorithms combine the capabilities of incorporating different data sources into the prediction model and using the data from one task to improve the accuracy on others. However, these methods do not necessarily produce interpretable results. Restricting the solutions to the set of interpretable solutions increases the computational burden of the learning problem significantly, leading to computationally prohibitive run times for some important biomedical applications. That is why we propose a multitask MKL formulation with a clustering of tasks and develop a highly time-efficient solution approach for it. Our solution method is based on the Benders decomposition and treating the clustering problem as finding a given number of tree structures in a graph; hence, it is called the forest formulation. We use our method to discriminate early-stage and late-stage cancers using genomic data and gene sets and compare our algorithm against two other algorithms. The two other algorithms are based on different approaches for linearization of the problem while all algorithms make use of the cutting-plane method. Our results indicate that as the number of tasks and/or the number of desired clusters increase, the forest formulation becomes increasingly favorable in terms of computational performance.
Collapse
|
2
|
Rahimi A, Gönen M. A multitask multiple kernel learning formulation for discriminating early- and late-stage cancers. Bioinformatics 2020; 36:3766-3772. [DOI: 10.1093/bioinformatics/btaa168] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 03/03/2020] [Accepted: 03/06/2020] [Indexed: 12/13/2022] Open
Abstract
Abstract
Motivation
Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction.
Results
We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature.
Availability and implementation
Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering
- School of Medicine, Koç University, İstanbul 34450, Turkey
- Department of Biomedical Engineering, School of Medicine, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
3
|
Hemap: An Interactive Online Resource for Characterizing Molecular Phenotypes across Hematologic Malignancies. Cancer Res 2019; 79:2466-2479. [DOI: 10.1158/0008-5472.can-18-2970] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 02/08/2019] [Accepted: 03/29/2019] [Indexed: 11/16/2022]
|
4
|
Todisco G, Manshouri T, Verstovsek S, Masarova L, Pierce SA, Keating MJ, Estrov Z. Chronic lymphocytic leukemia and myeloproliferative neoplasms concurrently diagnosed: clinical and biological characteristics. Leuk Lymphoma 2015; 57:1054-9. [PMID: 26402369 DOI: 10.3109/10428194.2015.1092527] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Chronic lymphocytic leukemia (CLL) and myeloproliferative neoplasms (MPN) may occur concomitantly. However, little is known about the pathobiological characteristics and interaction between the neoplastic clones in these rare cases of coinciding malignancies. We retrospectively examined the clinical and biological characteristics of 13 patients with concomitant CLL and MPN--eight primary myelofibrosis (PMF), three essential thrombocytosis (ET), and two polycythemia vera (PV)--who presented to our institution between 1998 and 2014, and tested all patients for MPN-specific aberrations, such as JAK2, MPL and CALR mutations. Along with epidemiological and molecular characterization of this rare condition, we found that JAK2 mutation can be detected 9 years prior to PMF diagnosis, suggesting that PMF clinical phenotype may require several years to develop and CLL/MPN clinical co-occurrence might be sustained by common molecular events. Some features of these patients suggest that pathobiologies of these diseases might be intertwined.
Collapse
Affiliation(s)
- Gabriele Todisco
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Taghi Manshouri
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Srdan Verstovsek
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Lucia Masarova
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Sherry A Pierce
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Michael J Keating
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Zeev Estrov
- a Department of Leukemia , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| |
Collapse
|
5
|
Knijnenburg TA, Bismeijer T, Wessels LFA, Shmulevich I. A multilevel pan-cancer map links gene mutations to cancer hallmarks. CHINESE JOURNAL OF CANCER 2015; 34:439-49. [PMID: 26369414 PMCID: PMC4593384 DOI: 10.1186/s40880-015-0050-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 07/07/2015] [Indexed: 11/16/2022]
Abstract
Background A central challenge in cancer research is to
create models that bridge the gap between the molecular level on which interventions can be designed and the cellular and tissue levels on which the disease phenotypes are manifested. This study was undertaken to construct such a model from functional annotations and explore its use when integrated with large-scale cancer genomics data. Methods We created a map that connects genes to cancer hallmarks via signaling pathways. We projected gene mutation and focal copy number data from various cancer types onto this map. We performed statistical analyses to uncover mutually exclusive and co-occurring oncogenic aberrations within this topology. Results Our analysis showed that although the genetic fingerprint of tumor types could be very different, there were less variations at the level of hallmarks, consistent with the idea that different genetic alterations have similar functional outcomes. Additionally, we showed how the multilevel map could help to clarify the role of infrequently mutated genes, and we demonstrated that mutually exclusive gene mutations were more prevalent in pathways, whereas many co-occurring gene mutations were associated with hallmark characteristics. Conclusions Overlaying this map with gene mutation and focal copy number data from various cancer types makes it possible to investigate the similarities and differences between tumor samples systematically at the levels of not only genes but also pathways and hallmarks. Electronic supplementary material The online version of this article (doi:10.1186/s40880-015-0050-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Tycho Bismeijer
- Bioinformatics and Statistics, Division of Molecular Carcinogenesis, Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands.
| | - Lodewyk F A Wessels
- Bioinformatics and Statistics, Division of Molecular Carcinogenesis, Netherlands Cancer Institute, 1066 CX, Amsterdam, The Netherlands.
| | | |
Collapse
|
6
|
Yli-Hietanen J, Ylipää A, Yli-Harja O. Cancer research in the era of next-generation sequencing and big data calls for intelligent modeling. CHINESE JOURNAL OF CANCER 2015; 34:423-6. [PMID: 25963029 PMCID: PMC4593335 DOI: 10.1186/s40880-015-0008-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 12/10/2014] [Indexed: 11/12/2022]
Abstract
We examine the role of big data and machine learning in cancer research. We describe an example in cancer research where gene-level data from The Cancer Genome Atlas (TCGA) consortium is interpreted using a pathway-level model. As the complexity of computational models increases, their sample requirements grow exponentially. This growth stems from the fact that the number of combinations of variables grows exponentially as the number of variables increases. Thus, a large sample size is needed. The number of variables in a computational model can be reduced by incorporating biological knowledge. One particularly successful way of doing this is by using available gene regulatory, signaling, metabolic, or context-specific pathway information. We conclude that the incorporation of existing biological knowledge is essential for the progress in using big data for cancer research.
Collapse
Affiliation(s)
- Jari Yli-Hietanen
- Department of Signal Processing, Tampere University of Technology, P. O. Box 553, Tampere, 33101, Finland.
| | - Antti Ylipää
- Department of Signal Processing, Tampere University of Technology, P. O. Box 553, Tampere, 33101, Finland.
| | - Olli Yli-Harja
- Department of Signal Processing, Tampere University of Technology, P. O. Box 553, Tampere, 33101, Finland.
| |
Collapse
|
7
|
Ping Y, Zhang H, Deng Y, Wang L, Zhao H, Pang L, Fan H, Xu C, Li F, Zhang Y, Gong Y, Xiao Y, Li X. IndividualizedPath: identifying genetic alterations contributing to the dysfunctional pathways in glioblastoma individuals. MOLECULAR BIOSYSTEMS 2015; 10:2031-42. [PMID: 24911613 DOI: 10.1039/c4mb00289j] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Due to the extensive complexity and high genetic heterogeneity of genetic alterations in cancer, comprehensively depicting the molecular mechanisms of cancer remains difficult. Characterizing personalized pathogenesis in cancer individuals can help to reveal new details of the complex mechanisms. In this study, we proposed an integrative method called IndividualizedPath to identify genetic alterations and their downstream risk pathways from the perspective of individuals through combining the DNA copy number, gene expression data and topological structures of biological pathways. By applying the method to TCGA glioblastoma multiforme (GBM) samples, we identified 394 gene-pathway pairs in 252 GBM individuals. We found that genes with copy number alterations showed high heterogeneity across GBM individuals, whereas they affected relatively consistent biological pathways. A global landscape of gene-pathway pairs showed that EGFR linked with multiple cancer-related biological pathways confers the highest risk of GBM. GBM individuals with MET-pathway pairs showed significantly shorter survival times than those with only MET amplification. Importantly, we found that the same risk pathways were affected by different genes in distinct groups of GBM individuals with a significant pattern of mutual exclusivity. Similarly, GBM subtype analysis revealed some subtype-specific gene-pathway pairs. In addition, we found that some rare copy number alterations had a large effect on contribution to numerous cancer-related pathways. In summary, our method offers the possibility to identify personalized cancer mechanisms, which can be applied to other types of cancer through the web server (http://bioinfo.hrbmu.edu.cn/IndividualizedPath/).
Collapse
Affiliation(s)
- Yanyan Ping
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Cai B, Jiang X. Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis. Cancer Inform 2014; 13:113-21. [PMID: 25520551 PMCID: PMC4251186 DOI: 10.4137/cin.s13882] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 09/05/2014] [Accepted: 09/09/2014] [Indexed: 12/11/2022] Open
Abstract
Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.
Collapse
Affiliation(s)
- Binghuang Cai
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xia Jiang
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
9
|
Abstract
Personalized medicine is the cornerstone of medical practice. It tailors treatments for specific conditions of an affected individual. The borders of personalized medicine are defined by limitations in technology and our understanding of biology, physiology and pathology of various conditions. Current advances in technology have provided physicians with the tools to investigate the molecular makeup of the disease. Translating these molecular make-ups to actionable targets has led to the development of small molecular inhibitors. Also, detailed understanding of genetic makeup has allowed us to develop prognostic markers, better known as companion diagnostics. Current attempts in the development of drug delivery systems offer the opportunity of delivering specific inhibitors to affected cells in an attempt to reduce the unwanted side effects of drugs.
Collapse
Affiliation(s)
- Gayane Badalian-Very
- Department of Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, 450 Brookline ave, Boston, MA 02115, United States. Tel.: + 1 617 513 7940; fax: + 1 617 632 5998.
| |
Collapse
|