1
|
Soleimani M, Harooni A, Erfani N, Khan AR, Saba T, Bahaj SA. Classification of cancer types based on microRNA expression using a hybrid radial basis function and particle swarm optimization algorithm. Microsc Res Tech 2024; 87:1052-1062. [PMID: 38230557 DOI: 10.1002/jemt.24492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 11/27/2023] [Accepted: 12/25/2023] [Indexed: 01/18/2024]
Abstract
The diagnosis and treatment of cancer is one of the most challenging aspects of the medical profession, despite advances in disease diagnosis. MicroRNAs are small noncoding RNA molecules involved in regulating gene expression and are associated with several cancer types. Therefore, the analysis of microRNA data has become one of the most important areas of cancer research in recent years. This paper presents an improved method for cancer-type classification based on microRNA expression data using a hybrid radial basis function (RBF) and particle swarm optimization (PSO) algorithm. Two datasets containing microRNA information were used, and preprocessing and normalization operations were performed on the raw data. Feature selection was carried out by using the PSO algorithm, which can identify the most relevant and informative features in the data along with helping to prioritize them. Using a PSO algorithm for feature selection is an effective approach to microRNA analysis. This enhances the accuracy and reliability of cancer-type classifications based on microRNA expression data. In the proposed method, we, respectively, achieved an accuracy of 0.95% and 0.91% on both datasets, with an average of 0.93%, using an improved RBF neural network classifier. These results demonstrate that the proposed method outperforms previous works. RESEARCH HIGHLIGHTS: To enhance the accuracy of cancer-type classifications based on microRNA expression data. We present a minimal feature selection method using particle swarm optimization to reduce computational load & radial basis function to improve accuracy.
Collapse
Affiliation(s)
- Masoumeh Soleimani
- Department of Mathematics and Statistical Sciences, Clemson University, Clemson, South Carolina, USA
| | - Aryan Harooni
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | - Nasim Erfani
- Department of Computer Engineering, Dolatabad Branch, Islamic Azad University, Isfahan, Iran
| | - Amjad Rehman Khan
- Artificial Intelligence & Data Analytics Lab CCIS Prince Sultan University, Riyadh, Saudi Arabia
| | - Tanzila Saba
- Artificial Intelligence & Data Analytics Lab CCIS Prince Sultan University, Riyadh, Saudi Arabia
| | - Saeed Ali Bahaj
- MIS Department College of Business Administration, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| |
Collapse
|
2
|
Islam A, Belhaouari SB, Rehman AU, Bensmail H. KNNOR: An oversampling technique for imbalanced datasets. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2021.108288] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
3
|
Makrogiannis S, Zheng K, Harris C. Discriminative Localized Sparse Approximations for Mass Characterization in Mammograms. Front Oncol 2021; 11:725320. [PMID: 35036353 PMCID: PMC8755640 DOI: 10.3389/fonc.2021.725320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 12/06/2021] [Indexed: 11/13/2022] Open
Abstract
The most common form of cancer among women in both developed and developing countries is breast cancer. The early detection and diagnosis of this disease is significant because it may reduce the number of deaths caused by breast cancer and improve the quality of life of those effected. Computer-aided detection (CADe) and computer-aided diagnosis (CADx) methods have shown promise in recent years for aiding in the human expert reading analysis and improving the accuracy and reproducibility of pathology results. One significant application of CADe and CADx is for breast cancer screening using mammograms. In image processing and machine learning research, relevant results have been produced by sparse analysis methods to represent and recognize imaging patterns. However, application of sparse analysis techniques to the biomedical field is challenging, as the objects of interest may be obscured because of contrast limitations or background tissues, and their appearance may change because of anatomical variability. We introduce methods for label-specific and label-consistent dictionary learning to improve the separation of benign breast masses from malignant breast masses in mammograms. We integrated these approaches into our Spatially Localized Ensemble Sparse Analysis (SLESA) methodology. We performed 10- and 30-fold cross validation (CV) experiments on multiple mammography datasets to measure the classification performance of our methodology and compared it to deep learning models and conventional sparse representation. Results from these experiments show the potential of this methodology for separation of malignant from benign masses as a part of a breast cancer screening workflow.
Collapse
Affiliation(s)
- Sokratis Makrogiannis
- Math Imaging and Visual Computing Lab, Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, Dover, DE, United States
| | | | | |
Collapse
|
4
|
Multi-category multi-state information ensemble-based classification method for precise diagnosis of three cancers. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06211-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
5
|
Raja Sree S, Kunthavai A. Hubness weighted svm ensemble for prediction of breast cancer subtypes. Technol Health Care 2021; 30:565-578. [PMID: 34397436 DOI: 10.3233/thc-212825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Breast cancer is a major disease causing panic among women worldwide. Since gene mutations are the root cause for cancer development, analyzing gene expressions can give more insights into various phenotype of cancer treatments. Breast Cancer subtype prediction from gene expression data can provide more information for cancer treatment decisions. OBJECTIVE Gene expressions are complex for analysis due to its high dimensional nature. Machine learning algorithms such as k-Nearest Neighbors, Support Vector Machine (SVM) and Random Forest are used with selection of features for prediction of breast cancer subtypes. Prediction accuracy of the existing methods are affected due to high dimensional nature of gene expressions. The objective of the work is to propose an efficient algorithm for the prediction of breast cancer subtypes from gene expression. METHODS For subtype prediction, a novel Hubness Weighted Support Vector machine algorithm (HWSVM) using bad hubness score as a weight measure to handle the outliers in the data has been proposed. Based on the various subtypes, features are projected into seven different feature sets and Ensemble based Hubness Aware Weighted Support Vector Machine (HWSVMEns) is implemented for breast cancer subtype prediction. RESULTS The proposed algorithms have been compared with the classical SVM and other traditional algorithms such as Random Forest, k-Nearest Neighbor algorithms and also with various gene selection methods. CONCLUSIONS Experimental results show that the proposed HWSVM outperforms other algorithms in terms of accuracy, precision, recall and F1 score due to the hubness weightage scheme and the ensemble approach. The experiments have shown an average accuracy of 92% across various gene expression datasets.
Collapse
Affiliation(s)
- S Raja Sree
- Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, India
| | - A Kunthavai
- Department of Computer Science and Engineering, Coimbatore Institute of Technology, Coimbatore, India
| |
Collapse
|
6
|
Velusamy D, Ramasamy K. Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 198:105770. [PMID: 33027698 DOI: 10.1016/j.cmpb.2020.105770] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Accepted: 09/19/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND AND OBJECTIVE Coronary artery disease (CAD) is considered one of the most prominent health issues causing high mortality in the world population. Hence, earlier diagnosis and prediction of CAD is essential for the proper medication of patients. The objective of this study is to develop a machine learning algorithm that will help in accurate diagnosis of CAD. METHODS In this paper, we have proposed a novel heterogeneous ensemble method combining three base classifiers viz., K-Nearest Neighbour, Random Forest, and Support Vector Machine for effective diagnosis of CAD. The results of base classifiers are combined using ensemble voting technique based on average-voting (AVEn), majority-voting (MVEn), and weighted-average voting (WAVEn) for prediction of CAD. The random forest-based Boruta wrapper feature selection algorithm and feature importance of SVM are used for relevant feature selection based on attribute importance and rank. RESULTS The proposed ensemble algorithm is developed using 5 features selected based on the feature importance and the performance of the algorithm is evaluated using the Z-Alizadeh Sani dataset. Further, the dataset is balanced using Synthetic Minority Over-sampling Technique and its performance is evaluated. The result analysis shows that the WAVEn algorithm achieves better classification accuracy, sensitivity, specificity and precision of 98.97%, 100%, 96.3% and 98.3% respectively for the original dataset. The WAVEn algorithm applied on the balanced dataset achieves 100% accuracy, sensitivity, specificity and precision in diagnosing CAD. To the best of author's knowledge, the accuracy achieved by WAVEn is the highest accuracy when compared with the state-of-the-art algorithms in the literature for both original and balanced dataset. CONCLUSIONS The statistical results prove the robustness of the WAVEn algorithm in reliably discriminating the CAD patients from healthy ones with high precision, and therefore it can be used for developing a decision support system for diagnosing CAD at an early stage.
Collapse
Affiliation(s)
- Durgadevi Velusamy
- Department of Computer Science and Engineering, M.Kumarasamy College of Engineering, Karur, Tamilnadu, 639 113, India.
| | - Karthikeyan Ramasamy
- Department of Electrical and Electronics Engineering, M.Kumarasamy College of Engineering, Karur, Tamilnadu, 639 113, India.
| |
Collapse
|
7
|
Jamin A, Abraham P, Humeau-Heurtier A. Machine learning for predictive data analytics in medicine: A review illustrated by cardiovascular and nuclear medicine examples. Clin Physiol Funct Imaging 2020; 41:113-127. [PMID: 33316137 DOI: 10.1111/cpf.12686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 11/01/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022]
Abstract
The evidence-based medicine allows the physician to evaluate the risk-benefit ratio of a treatment through setting and data. Risk-based choices can be done by the doctor using different information. With the emergence of new technologies, a large amount of data is recorded offering interesting perspectives with machine learning for predictive data analytics. Machine learning is an ensemble of methods that process data to model a learning problem. Supervised machine learning algorithms consist in using annotated data to construct the model. This category allows to solve prediction data analytics problems. In this paper, we detail the use of supervised machine learning algorithms for predictive data analytics problems in medicine. In the medical field, data can be split into two categories: medical images and other data. For brevity, our review deals with any kind of medical data excluding images. In this article, we offer a discussion around four supervised machine learning approaches: information-based, similarity-based, probability-based and error-based approaches. Each method is illustrated with detailed cardiovascular and nuclear medicine examples. Our review shows that model ensemble (ME) and support vector machine (SVM) methods are the most popular. SVM, ME and artificial neural networks often lead to better results than those given by other algorithms. In the coming years, more studies, more data, more tools and more methods will, for sure, be proposed.
Collapse
Affiliation(s)
- Antoine Jamin
- COTTOS Médical, Avrillé, France.,LERIA-Laboratoire d'Etude et de Recherche en Informatique d'Angers, Univ. Angers, Angers, France.,LARIS-Laboratoire Angevin de Recherche en Ingénierie des Systèmes, Univ. Angers, Angers, France
| | - Pierre Abraham
- Sports Medicine Department, UMR Mitovasc CNRS 6015 INSERM 1228, Angers University Hospital, Angers, France
| | - Anne Humeau-Heurtier
- LARIS-Laboratoire Angevin de Recherche en Ingénierie des Systèmes, Univ. Angers, Angers, France
| |
Collapse
|
8
|
World competitive contest-based artificial neural network: A new class-specific method for classification of clinical and biological datasets. Genomics 2020; 113:541-552. [PMID: 32991962 PMCID: PMC7521912 DOI: 10.1016/j.ygeno.2020.09.047] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/05/2020] [Accepted: 09/22/2020] [Indexed: 12/26/2022]
Abstract
Many data mining methods have been proposed to generate computer-aided diagnostic systems, which may determine diseases in their early stages by categorizing the data into some proper classes. Considering the importance of the existence of a suitable classifier, the present study aims to introduce an efficient approach based on the World Competitive Contests (WCC) algorithm as well as a multi-layer perceptron artificial neural network (ANN). Unlike the previously introduced methods, which each has developed a universal model for all different kinds of data classes, our proposed approach generates a single specific model for each individual class of data. The experimental results show that the proposed method (ANNWCC), which can be applied to both the balanced and unbalanced datasets, yields more than 76% (without applying feature selection methods) and 90% (with applying feature selection methods) of the average five-fold cross-validation accuracy on the 13 clinical and biological datasets. The findings also indicate that under different conditions, our proposed method can produce better results in comparison to some state-of-art meta-heuristic algorithms and methods in terms of various statistical and classification measurements. To classify the clinical and biological data, a multi-layer ANN and the WCC algorithm were combined. It was shown that developing a specific model for each individual class of data may yield better results compared with creating a universal model for all of the existing data classes. Besides, some efficient algorithms proved to be essential to generate acceptable biological results, and the methods' performance was found to be enhanced by fuzzifying or normalizing the biological data. We combined multi-layer artificial neural networks and world competitive contests algorithms to classify biological datasets The proposed method has been investigated on 13 clinical datasets with different properties Efficient models may yield better classification models and health diagnostic systems Feature selection methods can improve the performance of a model in separating case and control samples
Collapse
|
9
|
Zheng K, Harris CE, Jennane R, Makrogiannis S. Integrative blockwise sparse analysis for tissue characterization and classification. Artif Intell Med 2020; 107:101885. [PMID: 32828443 DOI: 10.1016/j.artmed.2020.101885] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 01/29/2020] [Accepted: 05/23/2020] [Indexed: 11/18/2022]
Abstract
The topic of sparse representation of samples in high dimensional spaces has attracted growing interest during the past decade. In this work, we develop sparse representation-based methods for classification of clinical imaging patterns into healthy and diseased states. We propose a spatial block decomposition method to address irregularities of the approximation problem and to build an ensemble of classifiers that we expect to yield more accurate numerical solutions than conventional sparse analyses of the complete spatial domain of the images. We introduce two classification decision strategies based on maximum a posteriori probability (BBMAP), or a log likelihood function (BBLL) and an approach to adjusting the classification decision criteria. To evaluate the performance of the proposed approach we used cross-validation techniques on imaging datasets with disease class labels. We first applied the proposed approach to diagnosis of osteoporosis using bone radiographs. In this problem we assume that changes in trabecular bone connectivity can be captured by intensity patterns. The second application domain is separation of breast lesions into benign and malignant categories in mammograms. The object classes in both of these applications are not linearly separable, and the classification accuracy may depend on the lesion size in the second application. Our results indicate that the proposed integrative sparse analysis addresses the ill-posedness of the approximation problem and produces very good class separation for trabecular bone characterization and for breast lesion characterization. Our approach yields higher classification rates than conventional sparse classification and previously published convolutional neural networks (CNNs) that we fine-tuned for our datasets, or utilized for feature extraction. The BBLL technique also produced higher classification rates than learners using hand-crafted texture features, and the Bag of Keypoints, which is a sophisticated patch-based method. Furthermore, our comparative experiments showed that the BBLL function may yield more accurate classification than BBMAP, because BBLL accounts for possible estimation bias.
Collapse
Affiliation(s)
- Keni Zheng
- Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, Dover, DE 19901-2277, USA
| | - Chelsea E Harris
- Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, Dover, DE 19901-2277, USA
| | - Rachid Jennane
- I3MTO Laboratory, University of Orleans, 45067 Orleans, France
| | - Sokratis Makrogiannis
- Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, Dover, DE 19901-2277, USA.
| |
Collapse
|
10
|
Zheng K, Harris C, Bakic P, Makrogiannis S. Spatially localized sparse representations for breast lesion characterization. Comput Biol Med 2020; 123:103914. [PMID: 32768050 PMCID: PMC7416513 DOI: 10.1016/j.compbiomed.2020.103914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 07/02/2020] [Accepted: 07/11/2020] [Indexed: 11/19/2022]
Abstract
RATIONALE The topic of sparse representation of samples in high dimensional spaces has attracted growing interest during the past decade. In this work, we develop sparse representation-based methods for classification of radiological imaging patterns of breast lesions into benign and malignant states. METHODS We propose a spatial block decomposition method to address irregularities of the approximation problem and to build an ensemble of classifiers (CL) that we expect to yield more accurate numerical solutions than conventional whole-region of interest (ROI) sparse analyses. We introduce two classification decision strategies based on maximum a posteriori probability (BBMAP-S), or a log likelihood function (BBLL-S). RESULTS To evaluate the performance of the proposed approach we used cross-validation techniques on imaging datasets with disease class labels. We utilized the proposed approach for separation of breast lesions into benign and malignant categories in mammograms. The level of difficulty is high in this application and the accuracy may depend on the lesion size. Our results indicate that the proposed integrative sparse analysis addresses the ill-posedness of the approximation problem, producing AUC (area under the receiver operating curve) value of 89.1% for randomized 30-fold cross-validation. CONCLUSIONS Furthermore, our comparative experiments showed that the BBLL-S decision function may yield more accurate classification than BBMAP-S because BBLL-S accounts for possible estimation bias.
Collapse
Affiliation(s)
- Keni Zheng
- Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, 1200 N. DuPont Hwy, Dover, DE, 19901-2277, USA
| | - Chelsea Harris
- Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, 1200 N. DuPont Hwy, Dover, DE, 19901-2277, USA
| | - Predrag Bakic
- Department of Radiology, Univ. of Pennsylvania, Philadelphia, PA, 19152, USA
| | - Sokratis Makrogiannis
- Division of Physics, Engineering, Mathematics and Computer Science, Delaware State University, 1200 N. DuPont Hwy, Dover, DE, 19901-2277, USA.
| |
Collapse
|
11
|
Hastings JF, O'Donnell YEI, Fey D, Croucher DR. Applications of personalised signalling network models in precision oncology. Pharmacol Ther 2020; 212:107555. [PMID: 32320730 DOI: 10.1016/j.pharmthera.2020.107555] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 04/07/2020] [Indexed: 02/07/2023]
Abstract
As our ability to provide in-depth, patient-specific characterisation of the molecular alterations within tumours rapidly improves, it is becoming apparent that new approaches will be required to leverage the power of this data and derive the full benefit for each individual patient. Systems biology approaches are beginning to emerge within this field as a potential method of incorporating large volumes of network level data and distilling a coherent, clinically-relevant prediction of drug response. However, the initial promise of this developing field is yet to be realised. Here we argue that in order to develop these precise models of individual drug response and tailor treatment accordingly, we will need to develop mathematical models capable of capturing both the dynamic nature of drug-response signalling networks and key patient-specific information such as mutation status or expression profiles. We also review the modelling approaches commonly utilised within this field, and outline recent examples of their use in furthering the application of systems biology for a precision medicine approach to cancer treatment.
Collapse
Affiliation(s)
- Jordan F Hastings
- The Kinghorn Cancer Centre, Garvan Institute of Medical Research, Sydney, Australia
| | | | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland; School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - David R Croucher
- The Kinghorn Cancer Centre, Garvan Institute of Medical Research, Sydney, Australia; School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4, Ireland; St Vincent's Hospital Clinical School, University of New South Wales, Sydney, NSW 2052, Australia.
| |
Collapse
|
12
|
Nagarajan R, Miller CS, Dawson D, Ebersole JL. Biologic modelling of periodontal disease progression. J Clin Periodontol 2019; 46:160-169. [DOI: 10.1111/jcpe.13064] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 12/03/2018] [Accepted: 01/05/2019] [Indexed: 12/17/2022]
Affiliation(s)
- Radhakrishnan Nagarajan
- Division of Biomedical Informatics College of Medicine University of Kentucky Lexington Kentucky
| | - Craig S. Miller
- Division of Oral Diagnosis, Oral Medicine and Oral Radiology University of Kentucky Lexington Kentucky
- Center for Oral Health Research College of Dentistry University of Kentucky Lexington Kentucky
| | - Dolph Dawson
- Center for Oral Health Research College of Dentistry University of Kentucky Lexington Kentucky
- Division of Periodontics University of Kentucky Lexington Kentucky
| | - Jeffrey L. Ebersole
- Center for Oral Health Research College of Dentistry University of Kentucky Lexington Kentucky
- Department of Biomedical Sciences School of Dental Medicine University of Nevada Las Vegas Las Vegas Nevada
| |
Collapse
|
13
|
Affiliation(s)
- Sriganesh Srihari
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland 4072, Australia; Current address: South Australian Health and Medical Research Institute, Adelaide 5001, South Australia, Australia.
| |
Collapse
|