1
|
Bahrambanan F, Alizamir M, Moradveisi K, Heddam S, Kim S, Kim S, Soleimani M, Afshar S, Taherkhani A. The development of an efficient artificial intelligence-based classification approach for colorectal cancer response to radiochemotherapy: deep learning vs. machine learning. Sci Rep 2025; 15:62. [PMID: 39748016 PMCID: PMC11696929 DOI: 10.1038/s41598-024-84023-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 12/19/2024] [Indexed: 01/04/2025] Open
Abstract
Colorectal cancer (CRC) is a form of cancer that impacts both the rectum and colon. Typically, it begins with a small abnormal growth known as a polyp, which can either be non-cancerous or cancerous. Therefore, early detection of colorectal cancer as the second deadliest cancer after lung cancer, can be highly beneficial. Moreover, the standard treatment for locally advanced colorectal cancer, which is widely accepted around the world, is chemoradiotherapy. Then, in this study, seven artificial intelligence models including decision tree, K-nearest neighbors, Adaboost, random forest, Gradient Boosting, multi-layer perceptron, and convolutional neural network were implemented to detect patients responder and non-responder to radiochemotherapy. For finding the potential predictors (genes), three feature selection strategies were employed including mutual information, F-classif, and Chi-Square. Based on feature selection models, four different scenarios were developed and five, ten, twenty and thirty features selected for designing a more accurate classification paradigm. The results of this study confirm that random forest, Gradient Boosting, decision tree, and K-nearest neighbors provided more accurate results in terms of accuracy, by 93.8%. Moreover, Among the feature selection methods, mutual information and F-classif showed the best results, while Chi-Square produced the worst results. Therefore, the suggested artificial intelligence models can be successfully applied as a robust approach for classification of colorectal cancer response to radiochemotherapy for medical studies.
Collapse
Affiliation(s)
- Fatemeh Bahrambanan
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Meysam Alizamir
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam.
- School of Engineering & Technology, Duy Tan University, Da Nang, Vietnam.
| | - Kayhan Moradveisi
- Civil Engineering Department, University of Kurdistan, Sanandaj, Iran
| | - Salim Heddam
- Faculty of Science, Agronomy Department, Hydraulics Division, University 20 Août 1955, Route El Hadaik BP 26, 21000, Skikda, Algeria
| | - Sungwon Kim
- Department of Railroad Construction and Safety Engineering, Dongyang University, Yeongju, 36040, Republic of Korea
| | - Seunghyun Kim
- Department of Biology, University of California San Diego, San Diego, CA, 92093, USA
| | - Meysam Soleimani
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Saeid Afshar
- Department of Molecular Medicine and Genetics, Medical School, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Amir Taherkhani
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
2
|
Verma J, Sandhu A, Popli R, Kumar R, Khullar V, Kansal I, Sharma A, Garg K, Kashyap N, Aurangzeb K. From slides to insights: Harnessing deep learning for prognostic survival prediction in human colorectal cancer histology. Open Life Sci 2023; 18:20220777. [PMID: 38152577 PMCID: PMC10751997 DOI: 10.1515/biol-2022-0777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/24/2023] [Accepted: 10/26/2023] [Indexed: 12/29/2023] Open
Abstract
Prognostic survival prediction in colorectal cancer (CRC) plays a crucial role in guiding treatment decisions and improving patient outcomes. In this research, we explore the application of deep learning techniques to predict survival outcomes based on histopathological images of human colorectal cancer. We present a retrospective multicenter study utilizing a dataset of 100,000 nonoverlapping image patches from hematoxylin & eosin-stained histological images of CRC and normal tissue. The dataset includes diverse tissue classes such as adipose, background, debris, lymphocytes, mucus, smooth muscle, normal colon mucosa, cancer-associated stroma, and colorectal adenocarcinoma epithelium. To perform survival prediction, we employ various deep learning architectures, including convolutional neural network, DenseNet201, InceptionResNetV2, VGG16, VGG19, and Xception. These architectures are trained on the dataset using a multicenter retrospective analysis approach. Extensive preprocessing steps are undertaken, including image normalization using Macenko's method and data augmentation techniques, to optimize model performance. The experimental findings reveal promising results, demonstrating the effectiveness of deep learning models in prognostic survival prediction. Our models achieve high accuracy, precision, recall, and validation metrics, showcasing their ability to capture relevant histological patterns associated with prognosis. Visualization techniques are employed to interpret the models' decision-making process, highlighting important features and regions contributing to survival predictions. The implications of this research are manifold. The accurate prediction of survival outcomes in CRC can aid in personalized medicine and clinical decision-making, facilitating tailored treatment plans for individual patients. The identification of important histological features and biomarkers provides valuable insights into disease mechanisms and may lead to the discovery of novel prognostic indicators. The transparency and explainability of the models enhance trust and acceptance, fostering their integration into clinical practice. Research demonstrates the potential of deep learning models for prognostic survival prediction in human colorectal cancer histology. The findings contribute to the understanding of disease progression and offer practical applications in personalized medicine. By harnessing the power of deep learning and histopathological analysis, we pave the way for improved patient care, clinical decision support, and advancements in prognostic prediction in CRC.
Collapse
Affiliation(s)
- Jyoti Verma
- Department of Computer Science and Engineering, Punjabi University, Patiala, India
| | - Archana Sandhu
- MM Institute of Computer Technology and Business Management Maharishi Markandeshwar (Deemed to be University) Mullana-Ambala, Haryana, 134007, India
| | - Renu Popli
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| | - Rajeev Kumar
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| | - Vikas Khullar
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| | - Isha Kansal
- Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
| | - Ashutosh Sharma
- Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun248007, Uttarakhand, India
| | - Kanwal Garg
- Department of Computer Science and Applications, Kurukshetra University, Kurukshetra, 136119, Haryana, India
| | - Neeru Kashyap
- Department of ECE, M.M. Engineering College, Maharishi Markandeshwar (Deemed to be University), Mullana, Ambala, Ambala, Haryana 134007, India
| | - Khursheed Aurangzeb
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh11543, Saudi Arabia
| |
Collapse
|
3
|
Al-Rajab M, Lu J, Xu Q, Kentour M, Sawsa A, Shuweikeh E, Joy M, Arasaradnam R. A hybrid machine learning feature selection model-HMLFSM to enhance gene classification applied to multiple colon cancers dataset. PLoS One 2023; 18:e0286791. [PMID: 37917732 PMCID: PMC10621932 DOI: 10.1371/journal.pone.0286791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 05/20/2023] [Indexed: 11/04/2023] Open
Abstract
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM-Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.
Collapse
Affiliation(s)
- Murad Al-Rajab
- College of Engineering, Abu Dhabi University, Abu Dhabi, United Arab Emirates
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Mohamed Kentour
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Ahlam Sawsa
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
- Bradford Teaching Hospitals NHS Foundation Trust, Bradford, United Kingdom
| | - Emad Shuweikeh
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Mike Joy
- University of Warwick, Coventry, United Kingdom
| | | |
Collapse
|
4
|
Khatun R, Akter M, Islam MM, Uddin MA, Talukder MA, Kamruzzaman J, Azad AKM, Paul BK, Almoyad MAA, Aryal S, Moni MA. Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data. Genes (Basel) 2023; 14:1802. [PMID: 37761941 PMCID: PMC10530870 DOI: 10.3390/genes14091802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/10/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.
Collapse
Affiliation(s)
- Rabea Khatun
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh;
| | - Maksuda Akter
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Manowarul Islam
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Ashraf Uddin
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Md. Alamin Talukder
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Joarder Kamruzzaman
- Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3842, Australia;
| | - AKM Azad
- Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11564, Saudi Arabia;
| | - Bikash Kumar Paul
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh;
- Department of Software Engineering, Daffodil International University (DIU), Dhaka 1342, Bangladesh
| | - Muhammad Ali Abdulllah Almoyad
- Department of Basic Medical Sciences, College of Applied Medical Sciences in Khamis Mushyt King Khalid University, Abha 61412, Saudi Arabia;
| | - Sunil Aryal
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Mohammad Ali Moni
- Artificial Intelligence & Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
5
|
Alshammari A. Ensemble recurrent neural network with whale optimization algorithm-based DNA sequence classification for medical applications. Soft comput 2023:1-14. [PMID: 37362270 PMCID: PMC10231859 DOI: 10.1007/s00500-023-08435-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2023] [Indexed: 06/28/2023]
Abstract
The modern data-driven era has facilitated the gathering of large quantities of biomedical and clinical data. The deoxyribonucleic acid gene expression datasets have become a vital focus for the research community because of their capability to detect pathogens via 'biomarkers' or particular modifications in the gene sequence which portray a specific pathogen. Metaheuristic-related feature selection (FS) efficiently filters out only the pertinent genes out of large feature sets to lessen the data storage and computation requirements. This paper embraces the whale optimization algorithm for the FS issue in HD microarray data for the effectual propagation of candidate solutions to reach global optima over sufficient iterations. The chosen data are classified by employing an ensemble recurrent network (ERNN) that retains the amalgamation of long short-term memory, bidirectional long short-term memory, and gated recurrent units. Analysis of this proposed ERNN methodology would be performed by correlating with diverse advanced methodologies, and thus, the ERNN attains 99.59% precision and 99.59% accuracy.
Collapse
Affiliation(s)
- Abdulaziz Alshammari
- Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
| |
Collapse
|
6
|
Hou X, Hou J, Huang G. Bi-dimensional principal gene feature selection from big gene expression data. PLoS One 2022; 17:e0278583. [PMID: 36477666 PMCID: PMC9728919 DOI: 10.1371/journal.pone.0278583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 11/20/2022] [Indexed: 12/12/2022] Open
Abstract
Gene expression sample data, which usually contains massive expression profiles of genes, is commonly used for disease related gene analysis. The selection of relevant genes from huge amount of genes is always a fundamental process in applications of gene expression data. As more and more genes have been detected, the size of gene expression data becomes larger and larger; this challenges the computing efficiency for extracting the relevant and important genes from gene expression data. In this paper, we provide a novel Bi-dimensional Principal Feature Selection (BPFS) method for efficiently extracting critical genes from big gene expression data. It applies the principal component analysis (PCA) method on sample and gene domains successively, aiming at extracting the relevant gene features and reducing redundancies while losing less information. The experimental results on four real-world cancer gene expression datasets show that the proposed BPFS method greatly reduces the data size and achieves a nearly double processing speed compared to the counterpart methods, while maintaining better accuracy and effectiveness.
Collapse
Affiliation(s)
- Xiaoqian Hou
- School of Information Technology, Deakin University, Melbourne, Victoria, Australia
| | - Jingyu Hou
- School of Information Technology, Deakin University, Melbourne, Victoria, Australia
| | - Guangyan Huang
- School of Information Technology, Deakin University, Melbourne, Victoria, Australia
- * E-mail:
| |
Collapse
|
7
|
Jha A, Quesnel-Vallières M, Wang D, Thomas-Tikhonenko A, Lynch KW, Barash Y. Identifying common transcriptome signatures of cancer by interpreting deep learning models. Genome Biol 2022; 23:117. [PMID: 35581644 PMCID: PMC9112525 DOI: 10.1186/s13059-022-02681-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 04/27/2022] [Indexed: 01/01/2023] Open
Abstract
Background Cancer is a set of diseases characterized by unchecked cell proliferation and invasion of surrounding tissues. The many genes that have been genetically associated with cancer or shown to directly contribute to oncogenesis vary widely between tumor types, but common gene signatures that relate to core cancer pathways have also been identified. It is not clear, however, whether there exist additional sets of genes or transcriptomic features that are less well known in cancer biology but that are also commonly deregulated across several cancer types. Results Here, we agnostically identify transcriptomic features that are commonly shared between cancer types using 13,461 RNA-seq samples from 19 normal tissue types and 18 solid tumor types to train three feed-forward neural networks, based either on protein-coding gene expression, lncRNA expression, or splice junction use, to distinguish between normal and tumor samples. All three models recognize transcriptome signatures that are consistent across tumors. Analysis of attribution values extracted from our models reveals that genes that are commonly altered in cancer by expression or splicing variations are under strong evolutionary and selective constraints. Importantly, we find that genes composing our cancer transcriptome signatures are not frequently affected by mutations or genomic alterations and that their functions differ widely from the genes genetically associated with cancer. Conclusions Our results highlighted that deregulation of RNA-processing genes and aberrant splicing are pervasive features on which core cancer pathways might converge across a large array of solid tumor types. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-022-02681-3).
Collapse
Affiliation(s)
- Anupama Jha
- Department of Computer and Information Science, School of Engineering and Applied Science, Philadelphia, USA.
| | - Mathieu Quesnel-Vallières
- Department of Genetics, Philadelphia, USA. .,Department of Biochemistry and Biophysics, Philadelphia, USA.
| | - David Wang
- Department of Genetics, Philadelphia, USA
| | - Andrei Thomas-Tikhonenko
- Department of Pathology and Laboratory Medicine, Philadelphia, USA.,Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA.,Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, USA
| | - Kristen W Lynch
- Department of Biochemistry and Biophysics, Philadelphia, USA
| | - Yoseph Barash
- Department of Computer and Information Science, School of Engineering and Applied Science, Philadelphia, USA. .,Department of Genetics, Philadelphia, USA.
| |
Collapse
|