1
|
Demsash AW, Chereka AA, Walle AD, Kassie SY, Bekele F, Bekana T. Machine learning algorithms' application to predict childhood vaccination among children aged 12-23 months in Ethiopia: Evidence 2016 Ethiopian Demographic and Health Survey dataset. PLoS One 2023; 18:e0288867. [PMID: 37851705 PMCID: PMC10584162 DOI: 10.1371/journal.pone.0288867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 07/06/2023] [Indexed: 10/20/2023] Open
Abstract
INTRODUCTION Childhood vaccination is a cost-effective public health intervention to reduce child mortality and morbidity. But, vaccination coverage remains low, and previous similar studies have not focused on machine learning algorithms to predict childhood vaccination. Therefore, knowledge extraction, association rule formulation, and discovering insights from hidden patterns in vaccination data are limited. Therefore, this study aimed to predict childhood vaccination among children aged 12-23 months using the best machine learning algorithm. METHODS A cross-sectional study design with a two-stage sampling technique was used. A total of 1617 samples of living children aged 12-23 months were used from the 2016 Ethiopian Demographic and Health Survey dataset. The data was pre-processed, and 70% and 30% of the observations were used for training, and evaluating the model, respectively. Eight machine learning algorithms were included for consideration of model building and comparison. All the included algorithms were evaluated using confusion matrix elements. The synthetic minority oversampling technique was used for imbalanced data management. Informational gain value was used to select important attributes to predict childhood vaccination. The If/ then logical association was used to generate rules based on relationships among attributes, and Weka version 3.8.6 software was used to perform all the prediction analyses. RESULTS PART was the first best machine learning algorithm to predict childhood vaccination with 95.53% accuracy. J48, multilayer perceptron, and random forest models were the consecutively best machine learning algorithms to predict childhood vaccination with 89.24%, 87.20%, and 82.37% accuracy, respectively. ANC visits, institutional delivery, health facility visits, higher education, and being rich were the top five attributes to predict childhood vaccination. A total of seven rules were generated that could jointly determine the magnitude of childhood vaccination. Of these, if wealth status = 3 (Rich), adequate ANC visits = 1 (yes), and residency = 2 (Urban), then the probability of childhood vaccination would be 86.73%. CONCLUSIONS The PART, J48, multilayer perceptron, and random forest algorithms were important algorithms for predicting childhood vaccination. The findings would provide insight into childhood vaccination and serve as a framework for further studies. Strengthening mothers' ANC visits, institutional delivery, improving maternal education, and creating income opportunities for mothers could be important interventions to enhance childhood vaccination.
Collapse
Affiliation(s)
| | - Alex Ayenew Chereka
- Department of Health Informatics, College of Health Science, Mettu University, Mettu, Ethiopia
| | - Agmasie Damtew Walle
- Department of Health Informatics, College of Health Science, Mettu University, Mettu, Ethiopia
| | - Sisay Yitayih Kassie
- Department of Health Informatics, College of Health Science, Mettu University, Mettu, Ethiopia
| | - Firomsa Bekele
- Department of Pharmacy, College of Health Science, Mettu University, Mettu, Ethiopia
| | - Teshome Bekana
- Biomedical Science Department, College of Health Science, Mettu University, Mettu, Ethiopia
| |
Collapse
|
2
|
Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023; 10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp, Walnut, CA, United States
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | - Alexander Simonov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Clinic for Neurosurgery, Laboratory of Experimental Neurooncology, Johannes Gutenberg University Medical Centre, Mainz, Germany
| | - Denis Kuzmin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Betul Karademir-Yilmaz
- Department of Biochemistry, School of Medicine/Genetic and Metabolic Diseases Research and Investigation Center (GEMHAM) Marmara University, Istanbul, Türkiye
| | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
3
|
Li G, Zhang X, Song X, Duan L, Wang G, Xiao Q, Li J, Liang L, Bai L, Bai S. Machine learning for predicting accuracy of lung and liver tumor motion tracking using radiomic features. Quant Imaging Med Surg 2023; 13:1605-1618. [PMID: 36915317 PMCID: PMC10006135 DOI: 10.21037/qims-22-621] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 12/02/2022] [Indexed: 01/11/2023]
Abstract
Background Internal tumor motion is commonly predicted using external respiratory signals. However, the internal/external correlation is complex and patient-specific. The purpose of this study was to develop various models based on the radiomic features of computed tomography (CT) images to predict the accuracy of tumor motion tracking using external surrogates and to find accurate and reliable tracking algorithms. Methods Images obtained from a total of 108 and 71 patients pathologically diagnosed with lung and liver cancers, respectively, were examined. Real-time position monitoring motion was fitted to tumor motion, and samples with fitting errors greater than 2 mm were considered positive. Radiomic features were extracted from internal target volumes of average intensity projections, and cross-validation least absolute shrinkage and selection operator (LassoCV) was used to conduct feature selection. Based on the radiomic features, a total of 26 separate models (13 for the lung and 13 for the liver) were trained and tested. Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were used to assess performance. Relative standard deviation was used to assess stability. Results Thirty-three and 22 radiomic features were selected for the lung and liver, respectively. For the lung, the AUC varied from 0.848 (decision tree) to 0.941 [support vector classifier (SVC), logistic regression]; sensitivity varied from 0.723 (extreme gradient boosting) to 0.848 [linear support vector classifier (linearSVC)]; specificity varied from 0.834 (gaussian naive bayes) to 0.936 [multilayer perceptron (MLP), wide and deep (W&D)]; and MLP and W&D had better performance and stability than the median. For the liver, the AUC varied from 0.677 [light gradient boosting machine (Light)] to 0.892 (logistic regression); sensitivity varied from 0.717 (W&D) to 0.862 (MLP); specificity varied from 0.566 (Light) to 0.829 (linearSVC); and logistic regression, MLP, and SVC had better performance and stability than the median. Conclusions Respiratory-sensitive radiomic features extracted from CT images of lung and liver tumors were proved to contain sufficient information to establish an external/internal motion relationship. We developed a rapid and accurate method based on radiomics to classify the accuracy of monitoring a patient's external surface for lung and liver tumor tracking. Several machine learning algorithms-in particular, MLP-demonstrated excellent classification performance and stability.
Collapse
Affiliation(s)
- Guangjun Li
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Xiangyu Zhang
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Xinyu Song
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Lian Duan
- Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Guangyu Wang
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Qing Xiao
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Jing Li
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Lan Liang
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Long Bai
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Sen Bai
- Department of Radiation Oncology, Cancer Center and State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
4
|
Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022; 10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Emergence of methods interrogating gene expression at high throughput gave birth to quantitative transcriptomics, but also posed a question of inter-comparison of expression profiles obtained using different equipment and protocols and/or in different series of experiments. Addressing this issue is challenging, because all of the above variables can dramatically influence gene expression signals and, therefore, cause a plethora of peculiar features in the transcriptomic profiles. Millions of transcriptomic profiles were obtained and deposited in public databases of which the usefulness is however strongly limited due to the inter-comparison issues; (2) Methods: Dozens of methods and software packages that can be generally classified as either flexible or predefined format harmonizers have been proposed, but none has become to the date the gold standard for unification of this type of Big Data; (3) Results: However, recent developments evidence that platform/protocol/batch bias can be efficiently reduced not only for the comparisons of limited transcriptomic datasets. Instead, instruments were proposed for transforming gene expression profiles into the universal, uniformly shaped format that can support multiple inter-comparisons for reasonable calculation costs. This forms a basement for universal indexing of all or most of all types of RNA sequencing and microarray hybridization profiles; (4) Conclusions: In this paper, we attempted to overview the landscape of modern approaches and methods in transcriptomic harmonization and focused on the practical aspects of their application.
Collapse
Affiliation(s)
- Nicolas Borisov
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
| | - Anton Buzdin
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, 119435 Moscow, Russia
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, 117997 Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), 1200 Brussels, Belgium
| |
Collapse
|
5
|
Borisov N, Sorokin M, Zolotovskaya M, Borisov C, Buzdin A. Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats. Curr Protoc 2022; 2:e444. [PMID: 35617464 DOI: 10.1002/cpz1.444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Maksim Sorokin
- Omicsway Corp., Walnut, California.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Marianna Zolotovskaya
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Oncobox Ltd., Moscow, Russia
| | | | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.,Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.,World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia.,PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
6
|
Arjmand B, Hamidpour SK, Tayanloo-Beik A, Goodarzi P, Aghayan HR, Adibi H, Larijani B. Machine Learning: A New Prospect in Multi-Omics Data Analysis of Cancer. Front Genet 2022; 13:824451. [PMID: 35154283 PMCID: PMC8829119 DOI: 10.3389/fgene.2022.824451] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/10/2022] [Indexed: 12/11/2022] Open
Abstract
Cancer is defined as a large group of diseases that is associated with abnormal cell growth, uncontrollable cell division, and may tend to impinge on other tissues of the body by different mechanisms through metastasis. What makes cancer so important is that the cancer incidence rate is growing worldwide which can have major health, economic, and even social impacts on both patients and the governments. Thereby, the early cancer prognosis, diagnosis, and treatment can play a crucial role at the front line of combating cancer. The onset and progression of cancer can occur under the influence of complicated mechanisms and some alterations in the level of genome, proteome, transcriptome, metabolome etc. Consequently, the advent of omics science and its broad research branches (such as genomics, proteomics, transcriptomics, metabolomics, and so forth) as revolutionary biological approaches have opened new doors to the comprehensive perception of the cancer landscape. Due to the complexities of the formation and development of cancer, the study of mechanisms underlying cancer has gone beyond just one field of the omics arena. Therefore, making a connection between the resultant data from different branches of omics science and examining them in a multi-omics field can pave the way for facilitating the discovery of novel prognostic, diagnostic, and therapeutic approaches. As the volume and complexity of data from the omics studies in cancer are increasing dramatically, the use of leading-edge technologies such as machine learning can have a promising role in the assessments of cancer research resultant data. Machine learning is categorized as a subset of artificial intelligence which aims to data parsing, classification, and data pattern identification by applying statistical methods and algorithms. This acquired knowledge subsequently allows computers to learn and improve accurate predictions through experiences from data processing. In this context, the application of machine learning, as a novel computational technology offers new opportunities for achieving in-depth knowledge of cancer by analysis of resultant data from multi-omics studies. Therefore, it can be concluded that the use of artificial intelligence technologies such as machine learning can have revolutionary roles in the fight against cancer.
Collapse
Affiliation(s)
- Babak Arjmand
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Babak Arjmand, ; Bagher Larijani,
| | - Shayesteh Kokabi Hamidpour
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Akram Tayanloo-Beik
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Parisa Goodarzi
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hamid Reza Aghayan
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hossein Adibi
- Diabetes Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Bagher Larijani
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- *Correspondence: Babak Arjmand, ; Bagher Larijani,
| |
Collapse
|
7
|
Anashkina AA, Leberfarb EY, Orlov YL. Recent Trends in Cancer Genomics and Bioinformatics Tools Development. Int J Mol Sci 2021; 22:ijms222212146. [PMID: 34830028 PMCID: PMC8618360 DOI: 10.3390/ijms222212146] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/08/2021] [Indexed: 02/07/2023] Open
Affiliation(s)
- Anastasia A. Anashkina
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991 Moscow, Russia;
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Elena Y. Leberfarb
- Department of Medicinal Chemistry, Novosibirsk State Medical University, 630091 Novosibirsk, Russia;
| | - Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991 Moscow, Russia;
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, 630090 Novosibirsk, Russia
- Life Sciences Department, Novosibirsk State University, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
- Correspondence: or
| |
Collapse
|
8
|
Borisov N, Sergeeva A, Suntsova M, Raevskiy M, Gaifullin N, Mendeleeva L, Gudkov A, Nareiko M, Garazha A, Tkachev V, Li X, Sorokin M, Surin V, Buzdin A. Machine Learning Applicability for Classification of PAD/VCD Chemotherapy Response Using 53 Multiple Myeloma RNA Sequencing Profiles. Front Oncol 2021; 11:652063. [PMID: 33937058 PMCID: PMC8083158 DOI: 10.3389/fonc.2021.652063] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/19/2021] [Indexed: 12/17/2022] Open
Abstract
Multiple myeloma (MM) affects ~500,000 people and results in ~100,000 deaths annually, being currently considered treatable but incurable. There are several MM chemotherapy treatment regimens, among which eleven include bortezomib, a proteasome-targeted drug. MM patients respond differently to bortezomib, and new prognostic biomarkers are needed to personalize treatments. However, there is a shortage of clinically annotated MM molecular data that could be used to establish novel molecular diagnostics. We report new RNA sequencing profiles for 53 MM patients annotated with responses on two similar chemotherapy regimens: bortezomib, doxorubicin, dexamethasone (PAD), and bortezomib, cyclophosphamide, dexamethasone (VCD), or with responses to their combinations. Fourteen patients received both PAD and VCD; six received only PAD, and 33 received only VCD. We compared profiles for the good and poor responders and found five genes commonly regulated here and in the previous datasets for other bortezomib regimens (all upregulated in the good responders): FGFR3, MAF, IGHA2, IGHV1-69, and GRB14. Four of these genes are linked with known immunoglobulin locus rearrangements. We then used five machine learning (ML) methods to build a classifier distinguishing good and poor responders for two cohorts: PAD + VCD (53 patients), and separately VCD (47 patients). We showed that the application of FloWPS dynamic data trimming was beneficial for all ML methods tested in both cohorts, and also in the previous MM bortezomib datasets. However, the ML models build for the different datasets did not allow cross-transferring, which can be due to different treatment regimens, experimental profiling methods, and MM heterogeneity.
Collapse
Affiliation(s)
- Nicolas Borisov
- Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
| | - Anna Sergeeva
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Maria Suntsova
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
| | - Mikhail Raevskiy
- Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
| | - Nurshat Gaifullin
- Department of Pathology, Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
| | - Larisa Mendeleeva
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Alexander Gudkov
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
| | - Maria Nareiko
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Andrew Garazha
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Victor Tkachev
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Xinmin Li
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Maxim Sorokin
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Omicsway Corp., Research Department, Walnut, CA, United States
- Oncobox Ltd., Research Department, Moscow, Russia
| | - Vadim Surin
- National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - Anton Buzdin
- I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
- Omicsway Corp., Research Department, Walnut, CA, United States
| |
Collapse
|
9
|
Using proteomic and transcriptomic data to assess activation of intracellular molecular pathways. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:1-53. [PMID: 34340765 DOI: 10.1016/bs.apcsb.2021.02.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Analysis of molecular pathway activation is the recent instrument that helps to quantize activities of various intracellular signaling, structural, DNA synthesis and repair, and biochemical processes. This may have a deep impact in fundamental research, bioindustry, and medicine. Unlike gene ontology analyses and numerous qualitative methods that can establish whether a pathway is affected in principle, the quantitative approach has the advantage of exactly measuring the extent of a pathway up/downregulation. This results in emergence of a new generation of molecular biomarkers-pathway activation levels, which reflect concentration changes of all measurable pathway components. The input data can be the high-throughput proteomic or transcriptomic profiles, and the output numbers take both positive and negative values and positively reflect overall pathway activation. Due to their nature, the pathway activation levels are more robust biomarkers compared to the individual gene products/protein levels. Here, we review the current knowledge of the quantitative gene expression interrogation methods and their applications for the molecular pathway quantization. We consider enclosed bioinformatic algorithms and their applications for solving real-world problems. Besides a plethora of applications in basic life sciences, the quantitative pathway analysis can improve molecular design and clinical investigations in pharmaceutical industry, can help finding new active biotechnological components and can significantly contribute to the progressive evolution of personalized medicine. In addition to the theoretical principles and concepts, we also propose publicly available software for the use of large-scale protein/RNA expression data to assess the human pathway activation levels.
Collapse
|
10
|
Buzdin A, Skvortsova II, Li X, Wang Y. Editorial: Next Generation Sequencing Based Diagnostic Approaches in Clinical Oncology. Front Oncol 2021; 10:635555. [PMID: 33585258 PMCID: PMC7876435 DOI: 10.3389/fonc.2020.635555] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 12/14/2020] [Indexed: 01/26/2023] Open
Affiliation(s)
- Anton Buzdin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia.,World-Class Research Center "Digital Biodesign and Personalized Healthcare", Sechenov First Moscow State Medical University, Moscow, Russia.,Translational Genome Bioinformatics Laboratory, Moscow Institute of Physics and Technology (National Research University), Moscow, Russia.,Research Department, OmicsWay Corp., Walnut, CA, United States
| | - Ira Ida Skvortsova
- Therapeutic Radiology and Oncology, Medical University of Innsbruck, Innsbruck, Austria.,Group for Experimental and Translational Radiooncology, Tyrolean Cancer Research Institute, Innsbruck, Austria.,PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| | - Xinmin Li
- Department of Pathology & Laboratory Medicine, University of California Los Angeles (UCLA) Technology Center for Genomics & Bioinformatics, Los Angeles, CA, United States
| | - Ye Wang
- Clinical Laboratory, Qingdao Central Hospital, The Second Affiliated Hospital of Medical College of Qingdao University, Qingdao, China
| |
Collapse
|
11
|
Borisov N, Ilnytskyy Y, Byeon B, Kovalchuk O, Kovalchuk I. System, Method and Software for Calculation of a Cannabis Drug Efficiency Index for the Reduction of Inflammation. Int J Mol Sci 2020; 22:ijms22010388. [PMID: 33396562 PMCID: PMC7795809 DOI: 10.3390/ijms22010388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 12/26/2020] [Accepted: 12/28/2020] [Indexed: 12/19/2022] Open
Abstract
There are many varieties of Cannabis sativa that differ from each other by composition of cannabinoids, terpenes and other molecules. The medicinal properties of these cultivars are often very different, with some being more efficient than others. This report describes the development of a method and software for the analysis of the efficiency of various cannabis extracts to detect the anti-inflammatory properties of the various cannabis extracts. The method uses high-throughput gene expression profiling data but can potentially use other omics data as well. According to the signaling pathway topology, the gene expression profiles are convoluted into the signaling pathway activities using a signaling pathway impact analysis (SPIA) method. The method was tested by inducing inflammation in human 3D epithelial tissues, including intestine, oral and skin, and then exposing these tissues to various extracts and then performing transcriptome analysis. The analysis showed a different efficiency of the various extracts in restoring the transcriptome changes to the pre-inflammation state, thus allowing to calculate a different cannabis drug efficiency index (CDEI).
Collapse
Affiliation(s)
- Nicolas Borisov
- Moscow Institute of Physics and Technology, 9 Institutsky lane, Dolgoprudny, Moscow Region 141701, Russia;
| | - Yaroslav Ilnytskyy
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada; (Y.I.); (B.B.); (O.K.)
- Pathway Rx., 16 Sandstone Rd. S., Lethbridge, AB T1K 7X8, Canada
| | - Boseon Byeon
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada; (Y.I.); (B.B.); (O.K.)
- Pathway Rx., 16 Sandstone Rd. S., Lethbridge, AB T1K 7X8, Canada
- Biomedical and Health Informatics, Computer Science Department, State University of New York, 2 S Clinton St, Syracuse, NY 13202, USA
| | - Olga Kovalchuk
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada; (Y.I.); (B.B.); (O.K.)
- Pathway Rx., 16 Sandstone Rd. S., Lethbridge, AB T1K 7X8, Canada
| | - Igor Kovalchuk
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada; (Y.I.); (B.B.); (O.K.)
- Pathway Rx., 16 Sandstone Rd. S., Lethbridge, AB T1K 7X8, Canada
- Correspondence:
| |
Collapse
|
12
|
Biswas N, Chakrabarti S. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer. Front Oncol 2020; 10:588221. [PMID: 33154949 PMCID: PMC7591760 DOI: 10.3389/fonc.2020.588221] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
Cancer is the manifestation of abnormalities of different physiological processes involving genes, DNAs, RNAs, proteins, and other biomolecules whose profiles are reflected in different omics data types. As these bio-entities are very much correlated, integrative analysis of different types of omics data, multi-omics data, is required to understanding the disease from the tumorigenesis to the disease progression. Artificial intelligence (AI), specifically machine learning algorithms, has the ability to make decisive interpretation of "big"-sized complex data and, hence, appears as the most effective tool for the analysis and understanding of multi-omics data for patient-specific observations. In this review, we have discussed about the recent outcomes of employing AI in multi-omics data analysis of different types of cancer. Based on the research trends and significance in patient treatment, we have primarily focused on the AI-based analysis for determining cancer subtypes, disease prognosis, and therapeutic targets. We have also discussed about AI analysis of some non-canonical types of omics data as they have the capability of playing the determiner role in cancer patient care. Additionally, we have briefly discussed about the data repositories because of their pivotal role in multi-omics data storing, processing, and analysis.
Collapse
Affiliation(s)
- Nupur Biswas
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, IICB TRUE Campus, Kolkata, India
| |
Collapse
|
13
|
Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments. BMC Med Genomics 2020; 13:111. [PMID: 32948183 PMCID: PMC7499993 DOI: 10.1186/s12920-020-00759-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/27/2020] [Indexed: 12/18/2022] Open
Abstract
Background Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn’t allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. Methods We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories. Results We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases. Conclusions We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others – microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.
Collapse
|
14
|
Bioinformatics Methods in Medical Genetics and Genomics. Int J Mol Sci 2020; 21:ijms21176224. [PMID: 32872128 PMCID: PMC7504073 DOI: 10.3390/ijms21176224] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 08/25/2020] [Indexed: 02/06/2023] Open
Abstract
Medical genomics relies on next-gen sequencing methods to decipher underlying molecular mechanisms of gene expression. This special issue collects materials originally presented at the “Centenary of Human Population Genetics” Conference-2019, in Moscow. Here we present some recent developments in computational methods tested on actual medical genetics problems dissected through genomics, transcriptomics and proteomics data analysis, gene networks, protein–protein interactions and biomedical literature mining. We have selected materials based on systems biology approaches, database mining. These methods and algorithms were discussed at the Digital Medical Forum-2019, organized by I.M. Sechenov First Moscow State Medical University presenting bioinformatics approaches for the drug targets discovery in cancer, its computational support, and digitalization of medical research, as well as at “Systems Biology and Bioinformatics”-2019 (SBB-2019) Young Scientists School in Novosibirsk, Russia. Selected recent advancements discussed at these events in the medical genomics and genetics areas are based on novel bioinformatics tools.
Collapse
|