Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tkachev V, Sorokin M, Borisov C, Garazha A, Buzdin A, Borisov N. Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology. Int J Mol Sci 2020;21:ijms21030713. [PMID: 31979006 PMCID: PMC7037338 DOI: 10.3390/ijms21030713] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/16/2020] [Accepted: 01/17/2020] [Indexed: 12/21/2022] Open

For:	Tkachev V, Sorokin M, Borisov C, Garazha A, Buzdin A, Borisov N. Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology. Int J Mol Sci 2020;21:ijms21030713. [PMID: 31979006 PMCID: PMC7037338 DOI: 10.3390/ijms21030713] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/16/2020] [Accepted: 01/17/2020] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

Demsash AW, Chereka AA, Walle AD, Kassie SY, Bekele F, Bekana T. Machine learning algorithms' application to predict childhood vaccination among children aged 12-23 months in Ethiopia: Evidence 2016 Ethiopian Demographic and Health Survey dataset. PLoS One 2023;18:e0288867. [PMID: 37851705 PMCID: PMC10584162 DOI: 10.1371/journal.pone.0288867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 07/06/2023] [Indexed: 10/20/2023] Open

Abstract

INTRODUCTION

Childhood vaccination is a cost-effective public health intervention to reduce child mortality and morbidity. But, vaccination coverage remains low, and previous similar studies have not focused on machine learning algorithms to predict childhood vaccination. Therefore, knowledge extraction, association rule formulation, and discovering insights from hidden patterns in vaccination data are limited. Therefore, this study aimed to predict childhood vaccination among children aged 12-23 months using the best machine learning algorithm.

METHODS

A cross-sectional study design with a two-stage sampling technique was used. A total of 1617 samples of living children aged 12-23 months were used from the 2016 Ethiopian Demographic and Health Survey dataset. The data was pre-processed, and 70% and 30% of the observations were used for training, and evaluating the model, respectively. Eight machine learning algorithms were included for consideration of model building and comparison. All the included algorithms were evaluated using confusion matrix elements. The synthetic minority oversampling technique was used for imbalanced data management. Informational gain value was used to select important attributes to predict childhood vaccination. The If/ then logical association was used to generate rules based on relationships among attributes, and Weka version 3.8.6 software was used to perform all the prediction analyses.

RESULTS

PART was the first best machine learning algorithm to predict childhood vaccination with 95.53% accuracy. J48, multilayer perceptron, and random forest models were the consecutively best machine learning algorithms to predict childhood vaccination with 89.24%, 87.20%, and 82.37% accuracy, respectively. ANC visits, institutional delivery, health facility visits, higher education, and being rich were the top five attributes to predict childhood vaccination. A total of seven rules were generated that could jointly determine the magnitude of childhood vaccination. Of these, if wealth status = 3 (Rich), adequate ANC visits = 1 (yes), and residency = 2 (Urban), then the probability of childhood vaccination would be 86.73%.

CONCLUSIONS

The PART, J48, multilayer perceptron, and random forest algorithms were important algorithms for predicting childhood vaccination. The findings would provide insight into childhood vaccination and serve as a framework for further studies. Strengthening mothers' ANC visits, institutional delivery, improving maternal education, and creating income opportunities for mothers could be important interventions to enhance childhood vaccination.

Collapse

Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023;10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open

Abstract

Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.

Collapse

Li G, Zhang X, Song X, Duan L, Wang G, Xiao Q, Li J, Liang L, Bai L, Bai S. Machine learning for predicting accuracy of lung and liver tumor motion tracking using radiomic features. Quant Imaging Med Surg 2023;13:1605-1618. [PMID: 36915317 PMCID: PMC10006135 DOI: 10.21037/qims-22-621] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 12/02/2022] [Indexed: 01/11/2023]

Abstract

Background

Internal tumor motion is commonly predicted using external respiratory signals. However, the internal/external correlation is complex and patient-specific. The purpose of this study was to develop various models based on the radiomic features of computed tomography (CT) images to predict the accuracy of tumor motion tracking using external surrogates and to find accurate and reliable tracking algorithms.

Methods

Images obtained from a total of 108 and 71 patients pathologically diagnosed with lung and liver cancers, respectively, were examined. Real-time position monitoring motion was fitted to tumor motion, and samples with fitting errors greater than 2 mm were considered positive. Radiomic features were extracted from internal target volumes of average intensity projections, and cross-validation least absolute shrinkage and selection operator (LassoCV) was used to conduct feature selection. Based on the radiomic features, a total of 26 separate models (13 for the lung and 13 for the liver) were trained and tested. Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were used to assess performance. Relative standard deviation was used to assess stability.

Results

Thirty-three and 22 radiomic features were selected for the lung and liver, respectively. For the lung, the AUC varied from 0.848 (decision tree) to 0.941 [support vector classifier (SVC), logistic regression]; sensitivity varied from 0.723 (extreme gradient boosting) to 0.848 [linear support vector classifier (linearSVC)]; specificity varied from 0.834 (gaussian naive bayes) to 0.936 [multilayer perceptron (MLP), wide and deep (W&D)]; and MLP and W&D had better performance and stability than the median. For the liver, the AUC varied from 0.677 [light gradient boosting machine (Light)] to 0.892 (logistic regression); sensitivity varied from 0.717 (W&D) to 0.862 (MLP); specificity varied from 0.566 (Light) to 0.829 (linearSVC); and logistic regression, MLP, and SVC had better performance and stability than the median.

Conclusions

Respiratory-sensitive radiomic features extracted from CT images of lung and liver tumors were proved to contain sufficient information to establish an external/internal motion relationship. We developed a rapid and accurate method based on radiomics to classify the accuracy of monitoring a patient's external surface for lung and liver tumor tracking. Several machine learning algorithms-in particular, MLP-demonstrated excellent classification performance and stability.

Collapse

Borisov N, Buzdin A. Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect. Biomedicines 2022;10:2318. [PMID: 36140419 PMCID: PMC9496268 DOI: 10.3390/biomedicines10092318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open

Borisov N, Sorokin M, Zolotovskaya M, Borisov C, Buzdin A. Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats. Curr Protoc 2022;2:e444. [PMID: 35617464 DOI: 10.1002/cpz1.444] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Abstract

Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).

Collapse

Arjmand B, Hamidpour SK, Tayanloo-Beik A, Goodarzi P, Aghayan HR, Adibi H, Larijani B. Machine Learning: A New Prospect in Multi-Omics Data Analysis of Cancer. Front Genet 2022;13:824451. [PMID: 35154283 PMCID: PMC8829119 DOI: 10.3389/fgene.2022.824451] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 01/10/2022] [Indexed: 12/11/2022] Open

Abstract Cancer is defined as a large group of diseases that is associated with abnormal cell growth, uncontrollable cell division, and may tend to impinge on other tissues of the body by different mechanisms through metastasis. What makes cancer so important is that the cancer incidence rate is growing worldwide which can have major health, economic, and even social impacts on both patients and the governments. Thereby, the early cancer prognosis, diagnosis, and treatment can play a crucial role at the front line of combating cancer. The onset and progression of cancer can occur under the influence of complicated mechanisms and some alterations in the level of genome, proteome, transcriptome, metabolome etc. Consequently, the advent of omics science and its broad research branches (such as genomics, proteomics, transcriptomics, metabolomics, and so forth) as revolutionary biological approaches have opened new doors to the comprehensive perception of the cancer landscape. Due to the complexities of the formation and development of cancer, the study of mechanisms underlying cancer has gone beyond just one field of the omics arena. Therefore, making a connection between the resultant data from different branches of omics science and examining them in a multi-omics field can pave the way for facilitating the discovery of novel prognostic, diagnostic, and therapeutic approaches. As the volume and complexity of data from the omics studies in cancer are increasing dramatically, the use of leading-edge technologies such as machine learning can have a promising role in the assessments of cancer research resultant data. Machine learning is categorized as a subset of artificial intelligence which aims to data parsing, classification, and data pattern identification by applying statistical methods and algorithms. This acquired knowledge subsequently allows computers to learn and improve accurate predictions through experiences from data processing. In this context, the application of machine learning, as a novel computational technology offers new opportunities for achieving in-depth knowledge of cancer by analysis of resultant data from multi-omics studies. Therefore, it can be concluded that the use of artificial intelligence technologies such as machine learning can have revolutionary roles in the fight against cancer. Collapse

Anashkina AA, Leberfarb EY, Orlov YL. Recent Trends in Cancer Genomics and Bioinformatics Tools Development. Int J Mol Sci 2021;22:ijms222212146. [PMID: 34830028 PMCID: PMC8618360 DOI: 10.3390/ijms222212146] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/08/2021] [Indexed: 02/07/2023] Open

Borisov N, Sergeeva A, Suntsova M, Raevskiy M, Gaifullin N, Mendeleeva L, Gudkov A, Nareiko M, Garazha A, Tkachev V, Li X, Sorokin M, Surin V, Buzdin A. Machine Learning Applicability for Classification of PAD/VCD Chemotherapy Response Using 53 Multiple Myeloma RNA Sequencing Profiles. Front Oncol 2021;11:652063. [PMID: 33937058 PMCID: PMC8083158 DOI: 10.3389/fonc.2021.652063] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/19/2021] [Indexed: 12/17/2022] Open

Affiliation(s)

Nicolas Borisov Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
Anna Sergeeva National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
Maria Suntsova I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia
Mikhail Raevskiy Moscow Institute of Physics and Technology, Laboratory for Translational Genomic Bioinformatics, Dolgoprudny, Russia
Nurshat Gaifullin Department of Pathology, Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
Larisa Mendeleeva National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
Alexander Gudkov I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia
Maria Nareiko National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
Andrew Garazha Omicsway Corp., Research Department, Walnut, CA, United States Oncobox Ltd., Research Department, Moscow, Russia
Victor Tkachev Omicsway Corp., Research Department, Walnut, CA, United States Oncobox Ltd., Research Department, Moscow, Russia
Xinmin Li Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA, United States
Maxim Sorokin I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia Omicsway Corp., Research Department, Walnut, CA, United States Oncobox Ltd., Research Department, Moscow, Russia
Vadim Surin National Research Center for Hematology, Ministry of Health of the Russian Federation, Moscow, Russia
Anton Buzdin I.M. Sechenov First Moscow State Medical University, Institute of Personalized Medicine, Moscow, Russia Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Group for Genomic Analysis of Cell Signaling Systems, Moscow, Russia Omicsway Corp., Research Department, Walnut, CA, United States

Collapse

Using proteomic and transcriptomic data to assess activation of intracellular molecular pathways. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021;127:1-53. [PMID: 34340765 DOI: 10.1016/bs.apcsb.2021.02.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Buzdin A, Skvortsova II, Li X, Wang Y. Editorial: Next Generation Sequencing Based Diagnostic Approaches in Clinical Oncology. Front Oncol 2021;10:635555. [PMID: 33585258 PMCID: PMC7876435 DOI: 10.3389/fonc.2020.635555] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 12/14/2020] [Indexed: 01/26/2023] Open

Borisov N, Ilnytskyy Y, Byeon B, Kovalchuk O, Kovalchuk I. System, Method and Software for Calculation of a Cannabis Drug Efficiency Index for the Reduction of Inflammation. Int J Mol Sci 2020;22:ijms22010388. [PMID: 33396562 PMCID: PMC7795809 DOI: 10.3390/ijms22010388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 12/26/2020] [Accepted: 12/28/2020] [Indexed: 12/19/2022] Open

Biswas N, Chakrabarti S. Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer. Front Oncol 2020;10:588221. [PMID: 33154949 PMCID: PMC7591760 DOI: 10.3389/fonc.2020.588221] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open

Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments. BMC Med Genomics 2020;13:111. [PMID: 32948183 PMCID: PMC7499993 DOI: 10.1186/s12920-020-00759-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/27/2020] [Indexed: 12/18/2022] Open

Bioinformatics Methods in Medical Genetics and Genomics. Int J Mol Sci 2020;21:ijms21176224. [PMID: 32872128 PMCID: PMC7504073 DOI: 10.3390/ijms21176224] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 08/25/2020] [Indexed: 02/06/2023] Open