1
|
Khabadze Z, Mordanov O, Shilyaeva E. Comparative Analysis of 3D Cephalometry Provided with Artificial Intelligence and Manual Tracing. Diagnostics (Basel) 2024; 14:2524. [PMID: 39594190 PMCID: PMC11592480 DOI: 10.3390/diagnostics14222524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 11/03/2024] [Accepted: 11/08/2024] [Indexed: 11/28/2024] Open
Abstract
OBJECTIVES To compare 3D cephalometric analysis performed using AI with that conducted manually by a specialist orthodontist. METHODS The CBCT scans (a field of view of 15 × 15 cm) used in the study were obtained from 30 consecutive patients, aged 18 to 50. The 3D cephalometric analysis was conducted using two methods. The first method involved manual tracing performed with the Invivo 6 software (Anatomage Inc., Santa Clara, CA, USA). The second method involved using AI for cephalometric measurements as part of an orthodontic report generated by the Diagnocat system (Diagnocat Ltd., San Francisco, CA, USA). RESULTS A statistically significant difference within one standard deviation of the parameter was found in the following measurements: SNA, SNB, and the left interincisal angle. Statistically significant differences within two standard deviations were noted in the following measurements: the right and left gonial angles, the left upper incisor, and the right lower incisor. No statistically significant differences were observed beyond two standard deviations. CONCLUSIONS AI in the form of Diagnocat proved to be effective in assessing the mandibular growth direction, defining the skeletal class, and estimating the overbite, overjet, and Wits parameter.
Collapse
Affiliation(s)
| | - Oleg Mordanov
- Department of Therapeutic Dentistry, Peoples’ Friendship University of Russia Named after Patrice Lumumba (RUDN University), 6 Miklukho-Maklaya St., 117198 Moscow, Russia; (Z.K.); (E.S.)
| | | |
Collapse
|
2
|
Nazaret A, Fan JL, Lavallée VP, Burdziak C, Cornish AE, Kiseliovas V, Bowman RL, Masilionis I, Chun J, Eisman SE, Wang J, Hong J, Shi L, Levine RL, Mazutis L, Blei D, Pe’er D, Azizi E. Joint representation and visualization of derailed cell states with Decipher. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.11.566719. [PMID: 38014231 PMCID: PMC10680623 DOI: 10.1101/2023.11.11.566719] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Biological insights often depend on comparing conditions such as disease and health, yet we lack effective computational tools for integrating single-cell genomics data across conditions or characterizing transitions from normal to deviant cell states. Here, we present Decipher, a deep generative model that characterizes derailed cell-state trajectories. Decipher jointly models and visualizes gene expression and cell state from normal and perturbed single-cell RNA-seq data, revealing shared and disrupted dynamics. We demonstrate its superior performance across diverse contexts, including in pancreatitis with oncogene mutation, acute myeloid leukemia, and gastric cancer.
Collapse
Affiliation(s)
- Achille Nazaret
- Department of Computer Science, Columbia University, New York, NY 10027, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10027, USA
| | - Joy Linyue Fan
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10027, USA
- Department of Biomedical Engineering, Columbia University, New York, NY 10027, USA
| | - Vincent-Philippe Lavallée
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Centre Hospitalier Universitaire Sainte-Justine Research Center, Montréal, QC, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC, Canada
| | - Cassandra Burdziak
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Andrew E. Cornish
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Immunology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Vaidotas Kiseliovas
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Alan and Sandra Gerry Metastasis and Tumor Ecosystems Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Robert L. Bowman
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ignas Masilionis
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Alan and Sandra Gerry Metastasis and Tumor Ecosystems Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jaeyoung Chun
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Alan and Sandra Gerry Metastasis and Tumor Ecosystems Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Shira E. Eisman
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - James Wang
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Justin Hong
- Department of Computer Science, Columbia University, New York, NY 10027, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10027, USA
| | - Lingting Shi
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10027, USA
| | - Ross L. Levine
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Linas Mazutis
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Alan and Sandra Gerry Metastasis and Tumor Ecosystems Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Institute of Biotechnology Vilnius University, Life Sciences Centre, Vilnius 02158, Lithuania
| | - David Blei
- Department of Computer Science, Columbia University, New York, NY 10027, USA
- Department of Statistics, Columbia University, New York, NY 10027, USA
- Data Science Institute, Columbia University, New York, NY 10027, USA
| | - Dana Pe’er
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Howard Hughes Medical Institute, Memorial Sloan Kettering Cancer Center, New York 10027, NY 10065, USA
| | - Elham Azizi
- Department of Computer Science, Columbia University, New York, NY 10027, USA
- Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10027, USA
- Department of Biomedical Engineering, Columbia University, New York, NY 10027, USA
- Data Science Institute, Columbia University, New York, NY 10027, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY 10032, USA
| |
Collapse
|
3
|
Alabdulqader EA, Alarfaj AA, Umer M, Eshmawi AA, Alsubai S, Kim TH, Ashraf I. Improving prediction of blood cancer using leukemia microarray gene data and Chi2 features with weighted convolutional neural network. Sci Rep 2024; 14:15625. [PMID: 38972881 PMCID: PMC11228030 DOI: 10.1038/s41598-024-65315-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 06/19/2024] [Indexed: 07/09/2024] Open
Abstract
Blood cancer has emerged as a growing concern over the past decade, necessitating early diagnosis for timely and effective treatment. The present diagnostic method, which involves a battery of tests and medical experts, is costly and time-consuming. For this reason, it is crucial to establish an automated diagnostic system for accurate predictions. A particular field of focus in medical research is the use of machine learning and leukemia microarray gene data for blood cancer diagnosis. Even with a great deal of research, more improvements are needed to reach the appropriate levels of accuracy and efficacy. This work presents a supervised machine-learning algorithm for blood cancer prediction. This work makes use of the 22,283-gene leukemia microarray gene data. Chi-squared (Chi2) feature selection methods and the synthetic minority oversampling technique (SMOTE)-Tomek resampling is used to overcome issues with imbalanced and high-dimensional datasets. To balance the dataset for each target class, SMOTE-Tomek creates synthetic data, and Chi2 chooses the most important features to train the learning models from 22,283 genes. A novel weighted convolutional neural network (CNN) model is proposed for classification, utilizing the support of three separate CNN models. To determine the importance of the proposed approach, extensive experiments are carried out on the datasets, including a performance comparison with the most advanced techniques. Weighted CNN demonstrates superior performance over other models when coupled with SMOTE-Tomek and Chi2 techniques, achieving a remarkable 99.9% accuracy. Results from k-fold cross-validation further affirm the supremacy of the proposed model.
Collapse
Affiliation(s)
- Ebtisam Abdullah Alabdulqader
- Department of Information Technology, College of Computer and Information Sciences, King Saud University, P. O. Box 800, 11421, Riyadh, Saudi Arabia
| | - Aisha Ahmed Alarfaj
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science and Information Technology, The Islamia University of Bahawalpur, Bahawalpur, 63100, Pakistan
| | - Ala' Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, 23218, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, 11942, Al-Kharj, Saudi Arabia
| | - Tai-Hoon Kim
- School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, 50, Daehak-ro, Yeosu-si, 59626, Jeollanam-do, Republic of Korea.
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, 38541, Korea.
| |
Collapse
|
4
|
Almorox L, Antequera L, Rojas I, Herrera LJ, Ortuño FM. Gene Expression Analysis for Uterine Cervix and Corpus Cancer Characterization. Genes (Basel) 2024; 15:312. [PMID: 38540371 PMCID: PMC10970626 DOI: 10.3390/genes15030312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 06/14/2024] Open
Abstract
The analysis of gene expression quantification data is a powerful and widely used approach in cancer research. This work provides new insights into the transcriptomic changes that occur in healthy uterine tissue compared to those in cancerous tissues and explores the differences associated with uterine cancer localizations and histological subtypes. To achieve this, RNA-Seq data from the TCGA database were preprocessed and analyzed using the KnowSeq package. Firstly, a kNN model was applied to classify uterine cervix cancer, uterine corpus cancer, and healthy uterine samples. Through variable selection, a three-gene signature was identified (VWCE, CLDN15, ADCYAP1R1), achieving consistent 100% test accuracy across 20 repetitions of a 5-fold cross-validation. A supplementary similar analysis using miRNA-Seq data from the same samples identified an optimal two-gene miRNA-coding signature potentially regulating the three-gene signature previously mentioned, which attained optimal classification performance with an 82% F1-macro score. Subsequently, a kNN model was implemented for the classification of cervical cancer samples into their two main histological subtypes (adenocarcinoma and squamous cell carcinoma). A uni-gene signature (ICA1L) was identified, achieving 100% test accuracy through 20 repetitions of a 5-fold cross-validation and externally validated through the CGCI program. Finally, an examination of six cervical adenosquamous carcinoma (mixed) samples revealed a pattern where the gene expression value in the mixed class aligned closer to the histological subtype with lower expression, prompting a reconsideration of the diagnosis for these mixed samples. In summary, this study provides valuable insights into the molecular mechanisms of uterine cervix and corpus cancers. The newly identified gene signatures demonstrate robust predictive capabilities, guiding future research in cancer diagnosis and treatment methodologies.
Collapse
Affiliation(s)
| | | | - Ignacio Rojas
- Department of Computer Engineering, Automatics and Robotics, C.I.T.I.C., University of Granada, Periodista Rafael Gómez Montero, 2, 18014 Granada, Spain; (L.A.); (L.A.); (L.J.H.); (F.M.O.)
| | | | | |
Collapse
|
5
|
Angelakis A, Soulioti I, Filippakis M. Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees. Heliyon 2023; 9:e20530. [PMID: 37860531 PMCID: PMC10582309 DOI: 10.1016/j.heliyon.2023.e20530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 09/27/2023] [Accepted: 09/28/2023] [Indexed: 10/21/2023] Open
Abstract
We define an iterative method for dimensionality reduction using categorical gradient boosted trees and Shapley values and created four machine learning models which potentially could be used as diagnostic tests for acute myeloid leukaemia (AML). For the final Catboost model we use a dataset of 2177 individuals using as features 16 probe sets and the age in order to classify if someone has AML or is healthy. The dataset is multicentric and consists of data from 27 organizations, 25 cities, 15 countries and 4 continents. The performance of our last model is specificity: 0.9909, sensitivity: 0.9985, F1-score: 0.9976 and its ROC-AUC: 0.9962 using ten fold cross validation. On an inference dataset the perormance is: specificity: 0.9909, sensitivity: 0.9969, F1-score: 0.9969 and its ROC-AUC: 0.9939. To the best of our knowledge the performance of our model is the best one in the literature, as regards the diagnosis of AML using similar or not data. Moreover, there has not been any bibliographic reference which associates AML or any other type of cancer with the 16 probe sets we used as features in our final model.
Collapse
Affiliation(s)
- Athanasios Angelakis
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam Public Health Research Institute, University of Amsterdam Data Science Center, Netherlands
| | - Ioanna Soulioti
- Department of Biology, National and Kapodistrian University of Athens, Greece
| | | |
Collapse
|
6
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
7
|
Ferrato MH, Marsh AG, Franke KR, Huang BJ, Kolb EA, DeRyckere D, Grahm DK, Chandrasekaran S, Crowgey EL. Machine learning classifier approaches for predicting response to RTK-type-III inhibitors demonstrate high accuracy using transcriptomic signatures and ex vivo data. BIOINFORMATICS ADVANCES 2023; 3:vbad034. [PMID: 37250111 PMCID: PMC10209528 DOI: 10.1093/bioadv/vbad034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 02/16/2023] [Accepted: 03/21/2023] [Indexed: 05/31/2023]
Abstract
Motivation The application of machine learning (ML) techniques in the medical field has demonstrated both successes and challenges in the precision medicine era. The ability to accurately classify a subject as a potential responder versus a nonresponder to a given therapy is still an active area of research pushing the field to create new approaches for applying machine-learning techniques. In this study, we leveraged publicly available data through the BeatAML initiative. Specifically, we used gene count data, generated via RNA-seq, from 451 individuals matched with ex vivo data generated from treatment with RTK-type-III inhibitors. Three feature selection techniques were tested, principal component analysis, Shapley Additive Explanation (SHAP) technique and differential gene expression analysis, with three different classifiers, XGBoost, LightGBM and random forest (RF). Sensitivity versus specificity was analyzed using the area under the curve (AUC)-receiver operating curves (ROCs) for every model developed. Results Our work demonstrated that feature selection technique, rather than the classifier, had the greatest impact on model performance. The SHAP technique outperformed the other feature selection techniques and was able to with high accuracy predict outcome response, with the highest performing model: Foretinib with 89% AUC using the SHAP technique and RF classifier. Our ML pipelines demonstrate that at the time of diagnosis, a transcriptomics signature exists that can potentially predict response to treatment, demonstrating the potential of using ML applications in precision medicine efforts. Availability and implementation https://github.com/UD-CRPL/RCDML. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Karl R Franke
- Nemours Children Health System, Wilmington, DE 19803, USA
| | - Benjamin J Huang
- Department of Pediatrics, University of California San Francisco, San Francisco, CA 94143, USA
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94143, USA
| | - E Anders Kolb
- Nemours Children Health System, Wilmington, DE 19803, USA
| | - Deborah DeRyckere
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Douglas K Grahm
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | | | | |
Collapse
|
8
|
El Alaoui Y, Elomri A, Qaraqe M, Padmanabhan R, Yasin Taha R, El Omri H, El Omri A, Aboumarzouk O. A Review of Artificial Intelligence Applications in Hematology Management: Current Practices and Future Prospects. J Med Internet Res 2022; 24:e36490. [PMID: 35819826 PMCID: PMC9328784 DOI: 10.2196/36490] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 05/14/2022] [Accepted: 05/29/2022] [Indexed: 12/23/2022] Open
Abstract
Background Machine learning (ML) and deep learning (DL) methods have recently garnered a great deal of attention in the field of cancer research by making a noticeable contribution to the growth of predictive medicine and modern oncological practices. Considerable focus has been particularly directed toward hematologic malignancies because of the complexity in detecting early symptoms. Many patients with blood cancer do not get properly diagnosed until their cancer has reached an advanced stage with limited treatment prospects. Hence, the state-of-the-art revolves around the latest artificial intelligence (AI) applications in hematology management. Objective This comprehensive review provides an in-depth analysis of the current AI practices in the field of hematology. Our objective is to explore the ML and DL applications in blood cancer research, with a special focus on the type of hematologic malignancies and the patient’s cancer stage to determine future research directions in blood cancer. Methods We searched a set of recognized databases (Scopus, Springer, and Web of Science) using a selected number of keywords. We included studies written in English and published between 2015 and 2021. For each study, we identified the ML and DL techniques used and highlighted the performance of each model. Results Using the aforementioned inclusion criteria, the search resulted in 567 papers, of which 144 were selected for review. Conclusions The current literature suggests that the application of AI in the field of hematology has generated impressive results in the screening, diagnosis, and treatment stages. Nevertheless, optimizing the patient’s pathway to treatment requires a prior prediction of the malignancy based on the patient’s symptoms or blood records, which is an area that has still not been properly investigated.
Collapse
Affiliation(s)
- Yousra El Alaoui
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Adel Elomri
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Marwa Qaraqe
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Regina Padmanabhan
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Ruba Yasin Taha
- National Center for Cancer Care and Research, Hamad Medical Corporation, Doha, Qatar
| | - Halima El Omri
- National Center for Cancer Care and Research, Hamad Medical Corporation, Doha, Qatar
| | - Abdelfatteh El Omri
- Surgical Research Section, Department of Surgery, Hamad Medical Corporation, Doha, Qatar
| | - Omar Aboumarzouk
- Surgical Research Section, Department of Surgery, Hamad Medical Corporation, Doha, Qatar.,College of Medicine, Qatar University, Doha, Qatar.,College of Medicine, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
9
|
Prabhakar SK, Ryu S, Jeong IC, Won DO. A Dual Level Analysis with Evolutionary Computing and Swarm Models for Classification of Leukemia. BIOMED RESEARCH INTERNATIONAL 2022; 2022:2052061. [PMID: 35663047 PMCID: PMC9162867 DOI: 10.1155/2022/2052061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 03/17/2022] [Accepted: 03/28/2022] [Indexed: 11/17/2022]
Abstract
One of the major reasons of mortality in human beings is cancer, and there is an absolute necessity for doctors to identify and treat a person suffering from it. Leukemia is a group of blood cancers that usually originates in the bone marrow and results in very high number of abnormal cells. For the diagnosis of cancer, microarray data serves as an important clinical application and serves as a great aid to the entire medical community. The dimensionality of the microarray data is too high, and so selection of suitable genes is quite an important step for the improvement of data classification. Therefore, for the prediction and diagnosis of cancer, there is an utmost necessity to select the most informative genes. In this work, Minimum Redundancy Maximum Relevance (MRMR), Signal to Noise Ratio (SNR), Multivariate Error Weight Uncorrelated Shrunken Centroid (EWUSC), and multivariate correlation-based feature selection (CFS) are chosen as initial feature selection techniques. Then, to select the most informative genes, five different kinds of evolutionary optimization techniques too are incorporated here such as African Buffalo Optimization (ABO), Artificial Bee Colony Optimization (ABCO), Cockroach Swarm Optimization (CSO), Imperialist Competitive Optimization (ICO), and Social Spider Optimization (SSO). Finally, the optimized values are fed through classification process and the best results are obtained when multivariate CFS with SSO is utilized and classified with Probabilistic Neural Network (PNN), and a high classification accuracy of 95.70% is obtained.
Collapse
Affiliation(s)
- Sunil Kumar Prabhakar
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| | - Semin Ryu
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| | - In cheol Jeong
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| | - Dong-Ok Won
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| |
Collapse
|
10
|
Carrillo-Perez F, Morales JC, Castillo-Secilla D, Gevaert O, Rojas I, Herrera LJ. Machine-Learning-Based Late Fusion on Multi-Omics and Multi-Scale Data for Non-Small-Cell Lung Cancer Diagnosis. J Pers Med 2022; 12:601. [PMID: 35455716 PMCID: PMC9025878 DOI: 10.3390/jpm12040601] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 03/29/2022] [Accepted: 04/06/2022] [Indexed: 01/27/2023] Open
Abstract
Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, 1265 Welch Rd, Stanford, CA 94305, USA;
| | - Juan Carlos Morales
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
| | - Daniel Castillo-Secilla
- Fujitsu Technology Solutions S.A, CoE Data Intelligence, Camino del Cerro de los Gamos, 1, Pozuelo de Alarcón, 28224 Madrid, Spain;
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, 1265 Welch Rd, Stanford, CA 94305, USA;
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18170 Granada, Spain; (J.C.M.); (I.R.); (L.J.H.)
| |
Collapse
|
11
|
Bajo-Morales J, Prieto-Prieto JC, Herrera LJ, Rojas I, Castillo-Secilla D. COVID-19 Biomarkers Recognition & Classification Using Intelligent Systems. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220328125029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background:
SARS-CoV-2 has paralyzed mankind due to its high transmissibility and its associated mortality, causing millions of infections and deaths worldwide. The search for gene expression biomarkers from the host transcriptional response to infection may help understand the underlying mechanisms by which the virus causes COVID-19. This research proposes a smart methodology integrating different RNA-Seq datasets from SARS-CoV-2, other respiratory diseases, and healthy patients.
Methods:
The proposed pipeline exploits the functionality of the ‘KnowSeq’ R/Bioc package, integrating different data sources and attaining a significantly larger gene expression dataset, thus endowing the results with higher statistical significance and robustness in comparison with previous studies in the literature. A detailed preprocessing step was carried out to homogenize the samples and build a clinical decision system for SARS-CoV-2. It uses machine learning techniques such as feature selection algorithm and supervised classification system. This clinical decision system uses the most differentially expressed genes among different diseases (including SARS-Cov-2) to develop a four-class classifier.
Results:
The multiclass classifier designed can discern SARS-CoV-2 samples, reaching an accuracy equal to 91.5%, a mean F1-Score equal to 88.5%, and a SARS-CoV-2 AUC equal to 94% by using only 15 genes as predictors. A biological interpretation of the gene signature extracted reveals relations with processes involved in viral responses.
Conclusion:
This work proposes a COVID-19 gene signature composed of 15 genes, selected after applying the feature selection ‘minimum Redundancy Maximum Relevance’ algorithm. The integration among several RNA-Seq datasets was a success, allowing for a considerable large number of samples and therefore providing greater statistical significance to the results than previous studies. Biological interpretation of the selected genes was also provided.
Collapse
Affiliation(s)
- Javier Bajo-Morales
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Juan Carlos Prieto-Prieto
- Nuclear Medicine Department, IMIBIC, University Hospital Reina Sofia, Menéndez Pidal Avenue, 14004, Córdoba, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Daniel Castillo-Secilla
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| |
Collapse
|
12
|
Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model. Sci Rep 2022; 12:1000. [PMID: 35046459 PMCID: PMC8770560 DOI: 10.1038/s41598-022-04835-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 12/09/2021] [Indexed: 01/21/2023] Open
Abstract
Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant importance. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. Despite research efforts, desired accuracy and efficiency necessitate further enhancements. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. For the current study, the leukemia microarray gene dataset containing 22,283 genes, is used. ADASYN resampling and Chi-squared (Chi2) features selection techniques are used to resolve imbalanced and high-dimensional dataset problems. ADASYN generates artificial data to make the dataset balanced for each target class, and Chi2 selects the best features out of 22,283 to train learning models. For classification, a hybrid logistics vector trees classifier (LVTrees) is proposed which utilizes logistic regression, support vector classifier, and extra tree classifier. Besides extensive experiments on the datasets, performance comparison with the state-of-the-art methods has been made for determining the significance of the proposed approach. LVTrees outperform all other models with ADASYN and Chi2 techniques with a significant 100% accuracy. Further, a statistical significance T-test is also performed to show the efficacy of the proposed approach. Results using k-fold cross-validation prove the supremacy of the proposed model.
Collapse
|
13
|
Carrillo-Perez F, Pecho OE, Morales JC, Paravina RD, Della Bona A, Ghinea R, Pulgar R, Pérez MDM, Herrera LJ. Applications of artificial intelligence in dentistry: A comprehensive review. J ESTHET RESTOR DENT 2021; 34:259-280. [PMID: 34842324 DOI: 10.1111/jerd.12844] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/30/2021] [Accepted: 11/09/2021] [Indexed: 12/25/2022]
Abstract
OBJECTIVE To perform a comprehensive review of the use of artificial intelligence (AI) and machine learning (ML) in dentistry, providing the community with a broad insight on the different advances that these technologies and tools have produced, paying special attention to the area of esthetic dentistry and color research. MATERIALS AND METHODS The comprehensive review was conducted in MEDLINE/PubMed, Web of Science, and Scopus databases, for papers published in English language in the last 20 years. RESULTS Out of 3871 eligible papers, 120 were included for final appraisal. Study methodologies included deep learning (DL; n = 76), fuzzy logic (FL; n = 12), and other ML techniques (n = 32), which were mainly applied to disease identification, image segmentation, image correction, and biomimetic color analysis and modeling. CONCLUSIONS The insight provided by the present work has reported outstanding results in the design of high-performance decision support systems for the aforementioned areas. The future of digital dentistry goes through the design of integrated approaches providing personalized treatments to patients. In addition, esthetic dentistry can benefit from those advances by developing models allowing a complete characterization of tooth color, enhancing the accuracy of dental restorations. CLINICAL SIGNIFICANCE The use of AI and ML has an increasing impact on the dental profession and is complementing the development of digital technologies and tools, with a wide application in treatment planning and esthetic dentistry procedures.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Department of Computer Architecture and Technology, E.T.S.I.I.T.-C.I.T.I.C. University of Granada, Granada, Spain
| | - Oscar E Pecho
- Post-Graduate Program in Dentistry, Dental School, University of Passo Fundo, Passo Fundo, Brazil
| | - Juan Carlos Morales
- Department of Computer Architecture and Technology, E.T.S.I.I.T.-C.I.T.I.C. University of Granada, Granada, Spain
| | - Rade D Paravina
- Department of Restorative Dentistry and Prosthodontics, School of Dentistry, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Alvaro Della Bona
- Post-Graduate Program in Dentistry, Dental School, University of Passo Fundo, Passo Fundo, Brazil
| | - Razvan Ghinea
- Department of Optics, Faculty of Science, University of Granada, Granada, Spain
| | - Rosa Pulgar
- Department of Stomatology, Campus Cartuja, University of Granada, Granada, Spain
| | - María Del Mar Pérez
- Department of Optics, Faculty of Science, University of Granada, Granada, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, E.T.S.I.I.T.-C.I.T.I.C. University of Granada, Granada, Spain
| |
Collapse
|
14
|
Carrillo-Perez F, Morales JC, Castillo-Secilla D, Molina-Castro Y, Guillén A, Rojas I, Herrera LJ. Non-small-cell lung cancer classification via RNA-Seq and histology imaging probability fusion. BMC Bioinformatics 2021; 22:454. [PMID: 34551733 PMCID: PMC8456075 DOI: 10.1186/s12859-021-04376-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 09/11/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Adenocarcinoma and squamous cell carcinoma are the two most prevalent lung cancer types, and their distinction requires different screenings, such as the visual inspection of histology slides by an expert pathologist, the analysis of gene expression or computer tomography scans, among others. In recent years, there has been an increasing gathering of biological data for decision support systems in the diagnosis (e.g. histology imaging, next-generation sequencing technologies data, clinical information, etc.). Using all these sources to design integrative classification approaches may improve the final diagnosis of a patient, in the same way that doctors can use multiple types of screenings to reach a final decision on the diagnosis. In this work, we present a late fusion classification model using histology and RNA-Seq data for adenocarcinoma, squamous-cell carcinoma and healthy lung tissue. RESULTS The classification model improves results over using each source of information separately, being able to reduce the diagnosis error rate up to a 64% over the isolate histology classifier and a 24% over the isolate gene expression classifier, reaching a mean F1-Score of 95.19% and a mean AUC of 0.991. CONCLUSIONS These findings suggest that a classification model using a late fusion methodology can considerably help clinicians in the diagnosis between the aforementioned lung cancer cancer subtypes over using each source of information separately. This approach can also be applied to any cancer type or disease with heterogeneous sources of information.
Collapse
Affiliation(s)
- Francisco Carrillo-Perez
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain.
| | - Juan Carlos Morales
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Daniel Castillo-Secilla
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Yésica Molina-Castro
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Alberto Guillén
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero, 2, 18014, Granada, Spain
| |
Collapse
|
15
|
Mohammed M, Mwambi H, Mboya IB, Elbashir MK, Omolo B. A stacking ensemble deep learning approach to cancer type classification based on TCGA data. Sci Rep 2021; 11:15626. [PMID: 34341396 PMCID: PMC8329290 DOI: 10.1038/s41598-021-95128-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 07/19/2021] [Indexed: 12/13/2022] Open
Abstract
Cancer tumor classification based on morphological characteristics alone has been shown to have serious limitations. Breast, lung, colorectal, thyroid, and ovarian are the most commonly diagnosed cancers among women. Precise classification of cancers into their types is considered a vital problem for cancer diagnosis and therapy. In this paper, we proposed a stacking ensemble deep learning model based on one-dimensional convolutional neural network (1D-CNN) to perform a multi-class classification on the five common cancers among women based on RNASeq data. The RNASeq gene expression data was downloaded from Pan-Cancer Atlas using GDCquery function of the TCGAbiolinks package in the R software. We used least absolute shrinkage and selection operator (LASSO) as feature selection method. We compared the results of the new proposed model with and without LASSO with the results of the single 1D-CNN and machine learning methods which include support vector machines with radial basis function, linear, and polynomial kernels; artificial neural networks; k-nearest neighbors; bagging trees. The results show that the proposed model with and without LASSO has a better performance compared to other classifiers. Also, the results show that the machine learning methods (SVM-R, SVM-L, SVM-P, ANN, KNN, and bagging trees) with under-sampling have better performance than with over-sampling techniques. This is supported by the statistical significance test of accuracy where the p-values for differences between the SVM-R and SVM-P, SVM-R and ANN, SVM-R and KNN are found to be p = 0.003, p = < 0.001, and p = < 0.001, respectively. Also, SVM-L had a significant difference compared to ANN p = 0.009. Moreover, SVM-P and ANN, SVM-P and KNN are found to be significantly different with p-values p = < 0.001 and p = < 0.001, respectively. In addition, ANN and bagging trees, ANN and KNN were found to be significantly different with p-values p = < 0.001 and p = 0.004, respectively. Thus, the proposed model can help in the early detection and diagnosis of cancer in women, and hence aid in designing early treatment strategies to improve survival.
Collapse
Affiliation(s)
- Mohanad Mohammed
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa.
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa
| | - Innocent B Mboya
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa
- Department of Epidemiology and Biostatistics, Kilimanjaro Christian Medical University College (KCMUCo), P. O. Box 2240, Moshi, Tanzania
| | - Murtada K Elbashir
- College of Computer and Information Sciences, Jouf University, Sakaka, 72441, Saudi Arabia
- Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, 11123, Sudan
| | - Bernard Omolo
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa
- Division of Mathematics and Computer Science, University of South Carolina-Upstate, 800 University Way, Spartanburg, USA
- School of Public Health, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
16
|
Walter W, Haferlach C, Nadarajah N, Schmidts I, Kühn C, Kern W, Haferlach T. How artificial intelligence might disrupt diagnostics in hematology in the near future. Oncogene 2021; 40:4271-4280. [PMID: 34103684 PMCID: PMC8225509 DOI: 10.1038/s41388-021-01861-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 05/11/2021] [Accepted: 05/24/2021] [Indexed: 02/07/2023]
Abstract
Artificial intelligence (AI) is about to make itself indispensable in the health care sector. Examples of successful applications or promising approaches range from the application of pattern recognition software to pre-process and analyze digital medical images, to deep learning algorithms for subtype or disease classification, and digital twin technology and in silico clinical trials. Moreover, machine-learning techniques are used to identify patterns and anomalies in electronic health records and to perform ad-hoc evaluations of gathered data from wearable health tracking devices for deep longitudinal phenotyping. In the last years, substantial progress has been made in automated image classification, reaching even superhuman level in some instances. Despite the increasing awareness of the importance of the genetic context, the diagnosis in hematology is still mainly based on the evaluation of the phenotype. Either by the analysis of microscopic images of cells in cytomorphology or by the analysis of cell populations in bidimensional plots obtained by flow cytometry. Here, AI algorithms not only spot details that might escape the human eye, but might also identify entirely new ways of interpreting these images. With the introduction of high-throughput next-generation sequencing in molecular genetics, the amount of available information is increasing exponentially, priming the field for the application of machine learning approaches. The goal of all the approaches is to allow personalized and informed interventions, to enhance treatment success, to improve the timeliness and accuracy of diagnoses, and to minimize technically induced misclassifications. The potential of AI-based applications is virtually endless but where do we stand in hematology and how far can we go?
Collapse
|
17
|
Advances in Electrochemical and Acoustic Aptamer-Based Biosensors and Immunosensors in Diagnostics of Leukemia. BIOSENSORS-BASEL 2021; 11:bios11060177. [PMID: 34073054 PMCID: PMC8227535 DOI: 10.3390/bios11060177] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/28/2021] [Accepted: 05/28/2021] [Indexed: 12/12/2022]
Abstract
Early diagnostics of leukemia is crucial for successful therapy of this disease. Therefore, development of rapid, sensitive, and easy-to-use methods for detection of this disease is of increased interest. Biosensor technology is challenged for this purpose. This review includes a brief description of the methods used in current clinical diagnostics of leukemia and provides recent achievements in sensor technology based on immuno- and DNA aptamer-based electrochemical and acoustic biosensors. The comparative analysis of immuno- and aptamer-based sensors shows a significant advantage of DNA aptasensors over immunosensors in the detection of cancer cells. The acoustic technique is of comparable sensitivity with those based on electrochemical methods; moreover, it is label-free and provides straightforward evaluation of the signal. Several examples of sensor development are provided and discussed.
Collapse
|
18
|
Castillo-Secilla D, Gálvez JM, Carrillo-Perez F, Verona-Almeida M, Redondo-Sánchez D, Ortuno FM, Herrera LJ, Rojas I. KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge. Comput Biol Med 2021; 133:104387. [PMID: 33872966 DOI: 10.1016/j.compbiomed.2021.104387] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 04/05/2021] [Accepted: 04/05/2021] [Indexed: 02/07/2023]
Abstract
KnowSeq R/Bioc package is designed as a powerful, scalable and modular software focused on automatizing and assembling renowned bioinformatic tools with new features and functionalities. It comprises a unified environment to perform complex gene expression analyses, covering all the needed processing steps to identify a gene signature for a specific disease to gather understandable knowledge. This process may be initiated from raw files either available at well-known platforms or provided by the users themselves, and in either case coming from different information sources and different Transcriptomic technologies. The pipeline makes use of a set of advanced algorithms, including the adaptation of a novel procedure for the selection of the most representative genes in a given multiclass problem. Similarly, an intelligent system able to classify new patients, providing the user the opportunity to choose one among a number of well-known and widespread classification and feature selection methods in Bioinformatics, is embedded. Furthermore, KnowSeq is engineered to automatically develop a complete and detailed HTML report of the whole process which is also modular and scalable. Biclass breast cancer and multiclass lung cancer study cases were addressed to rigorously assess the usability and efficiency of KnowSeq. The models built by using the Differential Expressed Genes achieved from both experiments reach high classification rates. Furthermore, biological knowledge was extracted in terms of Gene Ontologies, Pathways and related diseases with the aim of helping the expert in the decision-making process. KnowSeq is available at Bioconductor (https://bioconductor.org/packages/KnowSeq), GitHub (https://github.com/CasedUgr/KnowSeq) and Docker (https://hub.docker.com/r/casedugr/knowseq).
Collapse
Affiliation(s)
- Daniel Castillo-Secilla
- Department of Computer Architecture and Technology,University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero 2, 18014, Granada, Spain.
| | - Juan Manuel Gálvez
- Department of Computer Architecture and Technology,University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero 2, 18014, Granada, Spain
| | - Francisco Carrillo-Perez
- Department of Computer Architecture and Technology,University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero 2, 18014, Granada, Spain
| | - Marta Verona-Almeida
- Department of Computer Architecture and Technology,University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero 2, 18014, Granada, Spain
| | - Daniel Redondo-Sánchez
- Instituto de Investigación Biosanitaria de Granada, Non-Communicable Disease and Cancer Epidemiology Group, ibs.GRANADA, Avda. de Madrid, 15. Pabellón de Consultas Externas 2, 2a Planta, CP, 18012, Granada, Spain
| | - Francisco Manuel Ortuno
- Clinical Bioinformatics Area, Fundación Andaluza Progreso y Salud (FPS), Hospital Universitario Virgen del Rocío, Avenida Manuel Siurot s/n, 41013, Sevilla, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology,University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero 2, 18014, Granada, Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology,University of Granada. C.I.T.I.C., Periodista Rafael Gómez Montero 2, 18014, Granada, Spain
| |
Collapse
|
19
|
Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, Khan Z. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Comput Sci 2021; 7:e562. [PMID: 34141889 PMCID: PMC8176540 DOI: 10.7717/peerj-cs.562] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/04/2021] [Indexed: 05/10/2023]
Abstract
In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Naz Gul
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mushtaq Raza
- Department of Computer Sciences, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Umair Khalil
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Seema Zubair
- Department of Mathematics, Statistics and Computer Science, University of Agriculture Peshawar, Peshawar, Pakistan
| | - Zardad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| |
Collapse
|
20
|
Meggendorfer M, Walter W, Haferlach T. WGS and WTS in leukaemia: A tool for diagnostics? Best Pract Res Clin Haematol 2020; 33:101190. [DOI: 10.1016/j.beha.2020.101190] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 05/27/2020] [Indexed: 12/20/2022]
|
21
|
Galvez JM, Castillo-Secilla D, Herrera LJ, Valenzuela O, Caba O, Prados JC, Ortuno FM, Rojas I. Towards Improving Skin Cancer Diagnosis by Integrating Microarray and RNA-Seq Datasets. IEEE J Biomed Health Inform 2019; 24:2119-2130. [PMID: 31871000 DOI: 10.1109/jbhi.2019.2953978] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.
Collapse
|
22
|
Lukaszewski M, Lukaszewski R, Kosiorowska K, Jasinski M. The use of data science to analyse physiology of oxygen delivery in the extracorporeal circulation. BMC Cardiovasc Disord 2019; 19:292. [PMID: 31835993 PMCID: PMC6909655 DOI: 10.1186/s12872-019-01301-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 12/05/2019] [Indexed: 12/19/2022] Open
Abstract
Background Recent scientific reports have brought into light a new concept of goal-directed perfusion (GDP) that aims to recreate physiological conditions in which the risk of end-organ malperfusion is minimalized. The aim of our study was to analyse patients’ interim physiology while on cardiopulmonary bypass based on the haemodynamic and tissue oxygen delivery measurements. We also aimed to create a universal formula that may help in further implementation of the GDP concept. Methods We retrospectively analysed patients operated on at the Wroclaw University Hospital between June 2017 and December 2018. Since our observations provided an extensive amount of data, including the patients’ demographics, surgery details and the perfusion-related data, the Data Science methodology was applied. Results A total of 272 (mean age 62.5 ± 12.4, 74% male) cardiac surgery patients were included in the study. To study the relationship between haemodynamic and tissue oxygen parameters, the data for three different values of DO2i (280 ml/min/m2, 330 ml/min/m2 and 380 ml/min/m2), were evaluated. Each set of those lines showed a descending function of CI in Hb concentration for the set DO2i. Conclusions Modern calculation tools make it possible to create a common data platform from a very large database. Using that methodology we created models of haemodynamic compounds describing tissue oxygen delivery. The obtained unique patterns may both allow the adaptation of the flow in relation to the patient’s unique morphology that changes in time and contribute to wider and safer implementation of perfusion strategy which has been tailored to every patient’s individual needs.
Collapse
Affiliation(s)
- Marceli Lukaszewski
- Department of Anaesthesiology and Intensive Therapy, Wroclaw Medical University, Borowska 213, 50-556, Wroclaw, Poland.
| | | | - Kinga Kosiorowska
- Department and Clinic of Cardiac Surgery, Wroclaw Medical University, Wroclaw, Poland
| | - Marek Jasinski
- Department and Clinic of Cardiac Surgery, Wroclaw Medical University, Wroclaw, Poland
| |
Collapse
|
23
|
Li C, Xu J. Feature selection with the Fisher score followed by the Maximal Clique Centrality algorithm can accurately identify the hub genes of hepatocellular carcinoma. Sci Rep 2019; 9:17283. [PMID: 31754223 PMCID: PMC6872594 DOI: 10.1038/s41598-019-53471-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 11/01/2019] [Indexed: 02/08/2023] Open
Abstract
This study aimed to select the feature genes of hepatocellular carcinoma (HCC) with the Fisher score algorithm and to identify hub genes with the Maximal Clique Centrality (MCC) algorithm. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed to examine the enrichment of terms. Gene set enrichment analysis (GSEA) was used to identify the classes of genes that are overrepresented. Following the construction of a protein-protein interaction network with the feature genes, hub genes were identified with the MCC algorithm. The Kaplan–Meier plotter was utilized to assess the prognosis of patients based on expression of the hub genes. The feature genes were closely associated with cancer and the cell cycle, as revealed by GO, KEGG and GSEA enrichment analyses. Survival analysis showed that the overexpression of the Fisher score–selected hub genes was associated with decreased survival time (P < 0.05). Weighted gene co-expression network analysis (WGCNA), Lasso, ReliefF and random forest were used for comparison with the Fisher score algorithm. The comparison among these approaches showed that the Fisher score algorithm is superior to the Lasso and ReliefF algorithms in terms of hub gene identification and has similar performance to the WGCNA and random forest algorithms. Our results demonstrated that the Fisher score followed by the application of the MCC algorithm can accurately identify hub genes in HCC.
Collapse
Affiliation(s)
- Chengzhang Li
- College of Life Science, Henan Normal University, Xinxiang, 453007, Henan Province, China.,State Key Laboratory Cultivation Base for Cell Differentiation Regulation, Henan Normal University, Xinxiang, 453007, Henan Province, China.,Department of Physiology and Neurobiology, School of Basic Medical Sciences, Xinxiang Medical University, Xinxiang, 453003, Henan Province, China
| | - Jiucheng Xu
- Engineering Lab of Intelligence Business & Internet of Things, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, Henan Province, China. .,State Key Laboratory Cultivation Base for Cell Differentiation Regulation, Henan Normal University, Xinxiang, 453007, Henan Province, China.
| |
Collapse
|