1
|
Borisov N, Sergeeva A, Suntsova M, Raevskiy M, Gaifullin N, Mendeleeva L, Gudkov A, Nareiko M, Garazha A, Tkachev V, Li X, Sorokin M, Surin V, Buzdin A. Machine Learning Applicability for Classification of PAD/VCD Chemotherapy Response Using 53 Multiple Myeloma RNA Sequencing Profiles. Front Oncol 2021; 11:652063. [PMID: 33937058 PMCID: PMC8083158 DOI: 10.3389/fonc.2021.652063] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/19/2021] [Indexed: 12/17/2022] Open
Abstract
Multiple myeloma (MM) affects ~500,000 people and results in ~100,000 deaths annually, being currently considered treatable but incurable. There are several MM chemotherapy treatment regimens, among which eleven include bortezomib, a proteasome-targeted drug. MM patients respond differently to bortezomib, and new prognostic biomarkers are needed to personalize treatments. However, there is a shortage of clinically annotated MM molecular data that could be used to establish novel molecular diagnostics. We report new RNA sequencing profiles for 53 MM patients annotated with responses on two similar chemotherapy regimens: bortezomib, doxorubicin, dexamethasone (PAD), and bortezomib, cyclophosphamide, dexamethasone (VCD), or with responses to their combinations. Fourteen patients received both PAD and VCD; six received only PAD, and 33 received only VCD. We compared profiles for the good and poor responders and found five genes commonly regulated here and in the previous datasets for other bortezomib regimens (all upregulated in the good responders): FGFR3, MAF, IGHA2, IGHV1-69, and GRB14. Four of these genes are linked with known immunoglobulin locus rearrangements. We then used five machine learning (ML) methods to build a classifier distinguishing good and poor responders for two cohorts: PAD + VCD (53 patients), and separately VCD (47 patients). We showed that the application of FloWPS dynamic data trimming was beneficial for all ML methods tested in both cohorts, and also in the previous MM bortezomib datasets. However, the ML models build for the different datasets did not allow cross-transferring, which can be due to different treatment regimens, experimental profiling methods, and MM heterogeneity.
Collapse
Affiliation(s)
- Nicolas Borisov
- Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
| | - Anna Sergeeva
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Maria Suntsova
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
| | - Mikhail Raevskiy
- Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
| | - Nurshat Gaifullin
- Department of Pathology, Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
| | - Larisa Mendeleeva
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Alexander Gudkov
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
| | - Maria Nareiko
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Andrew Garazha
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Victor Tkachev
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Xinmin Li
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Vadim Surin
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
- Omicsway Corp., Research Department, Walnut, CA, United States
| |
Collapse
|
2
|
Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments. BMC Med Genomics 2020; 13:111. [PMID: 32948183 PMCID: PMC7499993 DOI: 10.1186/s12920-020-00759-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/27/2020] [Indexed: 12/18/2022] Open
Abstract
Background Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. Methods We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories. Results We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases. Conclusions We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.
Collapse
|
3
|
Borisov N, Buzdin A. New Paradigm of Machine Learning (ML) in Personalized Oncology: Data Trimming for Squeezing More Biomarkers From Clinical Datasets. Front Oncol 2019; 9:658. [PMID: 31380288 PMCID: PMC6650540 DOI: 10.3389/fonc.2019.00658] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 07/05/2019] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nicolas Borisov
- Department of Personalized Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| | - Anton Buzdin
- Department of Personalized Medicine, I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia.,Department of Genomics and Postgenomic Technologies, Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| |
Collapse
|
4
|
Tkachev V, Sorokin M, Mescheryakov A, Simonov A, Garazha A, Buzdin A, Muchnik I, Borisov N. FLOating-Window Projective Separator (FloWPS): A Data Trimming Tool for Support Vector Machines (SVM) to Improve Robustness of the Classifier. Front Genet 2019; 9:717. [PMID: 30697229 PMCID: PMC6341065 DOI: 10.3389/fgene.2018.00717] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Accepted: 12/21/2018] [Indexed: 01/31/2023] Open
Abstract
Here, we propose a heuristic technique of data trimming for SVM termed FLOating Window Projective Separator (FloWPS), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in validation dataset that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. Next, similarly to the k nearest neighbors (kNN) method, for each point of a validation dataset, FloWPS takes into account only the proximal points of the training dataset. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.
Collapse
Affiliation(s)
- Victor Tkachev
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| | - Maxim Sorokin
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
| | | | - Alexander Simonov
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| | - Andrew Garazha
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States
| | - Anton Buzdin
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| | - Ilya Muchnik
- Hill Center, Rutgers University, Piscataway, NJ, United States
| | - Nicolas Borisov
- Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.,I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia
| |
Collapse
|