1
|
Szeghalmy S, Fazekas A. A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:2333. [PMID: 36850931 PMCID: PMC9967638 DOI: 10.3390/s23042333] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/06/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]
Abstract
Nowadays, the solution to many practical problems relies on machine learning tools. However, compiling the appropriate training data set for real-world classification problems is challenging because collecting the right amount of data for each class is often difficult or even impossible. In such cases, we can easily face the problem of imbalanced learning. There are many methods in the literature for solving the imbalanced learning problem, so it has become a serious question how to compare the performance of the imbalanced learning methods. Inadequate validation techniques can provide misleading results (e.g., due to data shift), which leads to the development of methods designed for imbalanced data sets, such as stratified cross-validation (SCV) and distribution optimally balanced SCV (DOB-SCV). Previous studies have shown that higher classification performance scores (AUC) can be achieved on imbalanced data sets using DOB-SCV instead of SCV. We investigated the effect of the oversamplers on this difference. The study was conducted on 420 data sets, involving several sampling methods and the DTree, kNN, SVM, and MLP classifiers. We point out that DOB-SCV often provides a little higher F1 and AUC values for classification combined with sampling. However, the results also prove that the selection of the sampler-classifier pair is more important for the classification performance than the choice between the DOB-SCV and the SCV techniques.
Collapse
|
2
|
Minority- Prediction- Probability-based Oversampling Technique for Imbalanced Learning. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
3
|
Gupta S, Gupta MK, Shabaz M, Sharma A. Deep learning techniques for cancer classification using microarray gene expression data. Front Physiol 2022; 13:952709. [PMID: 36246115 PMCID: PMC9563992 DOI: 10.3389/fphys.2022.952709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/01/2022] [Indexed: 11/28/2022] Open
Abstract
Cancer is one of the top causes of death globally. Recently, microarray gene expression data has been used to aid in cancer’s effective and early detection. The use of DNA microarray technology to uncover information from the expression levels of thousands of genes has enormous promise. The DNA microarray technique can determine the levels of thousands of genes simultaneously in a single experiment. The analysis of gene expression is critical in many disciplines of biological study to obtain the necessary information. This study analyses all the research studies focused on optimizing gene selection for cancer detection using artificial intelligence. One of the most challenging issues is figuring out how to extract meaningful information from massive databases. Deep Learning architectures have performed efficiently in numerous sectors and are used to diagnose many other chronic diseases and to assist physicians in making medical decisions. In this study, we have evaluated the results of different optimizers on a RNA sequence dataset. The Deep learning algorithm proposed in the study classifies five different forms of cancer, including kidney renal clear cell carcinoma (KIRC), Breast Invasive Carcinoma (BRCA), lung adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD) and Colon Adenocarcinoma (COAD). The performance of different optimizers like Stochastic gradient descent (SGD), Root Mean Squared Propagation (RMSProp), Adaptive Gradient Optimizer (AdaGrad), and Adaptive Momentum (AdaM). The experimental results gathered on the dataset affirm that AdaGrad and Adam. Also, the performance analysis has been done using different learning rates and decay rates. This study discusses current advancements in deep learning-based gene expression data analysis using optimized feature selection methods.
Collapse
Affiliation(s)
- Surbhi Gupta
- Department of Computer Science and Engineering Department, SMVDU, Jammu, India
- Model Institute of Engineering and Technology, Jammu, India
| | - Manoj K. Gupta
- Department of Computer Science and Engineering Department, SMVDU, Jammu, India
| | - Mohammad Shabaz
- Model Institute of Engineering and Technology, Jammu, India
- *Correspondence: Mohammad Shabaz,
| | - Ashutosh Sharma
- School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
| |
Collapse
|
4
|
Gupta S, Shabaz M, Vyas S. Artificial intelligence and IoT based prediction of Covid-19 using chest X-ray images. SMART HEALTH (AMSTERDAM, NETHERLANDS) 2022; 25:100299. [PMID: 35783463 PMCID: PMC9233885 DOI: 10.1016/j.smhl.2022.100299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 05/31/2022] [Accepted: 06/15/2022] [Indexed: 11/30/2022]
Abstract
Coronavirus illness (COVID-19), discovered in late 2019, has spread rapidly worldwide, resulting in significant mortality. This study analyzed the performance of studies that employed machines and DL on chest X-ray pictures and CT scans for COVID-19 diagnosis. ML approaches on CT and X-ray images aided incorrectly in identifying COVID-19. The fast spread of COVID-19 worldwide and the growing number of deaths necessitates an immediate response from all sectors. Authorities will be able to deal with the effects more efficiently if such illnesses can be predicted in the future. Furthermore, it is crucial to maintain track of the number of infected persons through regular check-ups, and it is frequently required to confine affected people and implement medical treatments. In addition, various additional elements, such as environmental influences and commonalities among the most afflicted places, should be considered to slow the spread of COVID-19, and precautions should be taken. AI-based approaches for the prediction and diagnosis of COVID-19 were suggested in this paper. This Review Article discusses current advances in AI technology and its biological applications, particularly the coronavirus.
Collapse
Affiliation(s)
- Surbhi Gupta
- Model Institute of Engineering and Technology, Jammu, J&K, India
| | - Mohammad Shabaz
- Model Institute of Engineering and Technology, Jammu, J&K, India
| | - Sonali Vyas
- University of Petroleum and Energy Studies, Dehradun, India
| |
Collapse
|
5
|
Prusty S, Patnaik S, Dash SK. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. FRONTIERS IN NANOTECHNOLOGY 2022. [DOI: 10.3389/fnano.2022.972421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Cancer is the unregulated development of abnormal cells in the human body system. Cervical cancer, also known as cervix cancer, develops on the cervix’s surface. This causes an overabundance of cells to build up, eventually forming a lump or tumour. As a result, early detection is essential to determine what effective treatment we can take to overcome it. Therefore, the novel Machine Learning (ML) techniques come to a place that predicts cervical cancer before it becomes too serious. Furthermore, four common diagnosis testing namely, Hinselmann, Schiller, Cytology, and Biopsy have been compared and predicted with four common ML models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (K-NNs), and Extreme Gradient Boosting (XGB). Additionally, to enhance the better performance of ML models, the Stratified k-fold cross-validation (SKCV) method has been implemented over here. The findings of the experiments demonstrate that utilizing an RF classifier for analyzing the cervical cancer risk, could be a good alternative for assisting clinical specialists in classifying this disease in advance.
Collapse
|
6
|
Gupta S, Kalaivani S, Rajasundaram A, Ameta GK, Oleiwi AK, Dugbakie BN. Prediction Performance of Deep Learning for Colon Cancer Survival Prediction on SEER Data. BIOMED RESEARCH INTERNATIONAL 2022; 2022:1467070. [PMID: 35757479 PMCID: PMC9225873 DOI: 10.1155/2022/1467070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 05/21/2022] [Accepted: 05/25/2022] [Indexed: 11/22/2022]
Abstract
Colon and rectal cancers are the most common kinds of cancer globally. Colon cancer is more prevalent in men than in women. Early detection increases the likelihood of survival, and treatment significantly increases the likelihood of eradicating the disease. The Surveillance, Epidemiology, and End Results (SEER) programme is an excellent source of domestic cancer statistics. SEER includes nearly 30% of the United States population, covering various races and geographic locations. The data are made public via the SEER website when a SEER limited-use data agreement form is submitted and approved. We investigate data from the SEER programme, specifically colon cancer statistics, in this study. Our objective is to create reliable colon cancer survival and conditional survival prediction algorithms. In this study, we have presented an overview of cancer diagnosis methods and the treatments used to cure cancer. This paper presents an analysis of prediction performance of multiple deep learning approaches. The performance of multiple deep learning models is thoroughly examined to discover which algorithm surpasses the others, followed by an investigation of the network's prediction accuracy. The simulation outcomes indicate that automated prediction models can predict colon cancer patient survival. Deep autoencoders displayed the best performance outcomes attaining 97% accuracy and 95% area under curve-receiver operating characteristic (AUC-ROC).
Collapse
Affiliation(s)
- Surbhi Gupta
- Model Institute of Engineering & Technology, Jammu, J&K, India
| | - S. Kalaivani
- School of Information Technology and Engineering, Vellore Institute of Technology (VIT), Vellore, Tamil Nadu, India
| | - Archana Rajasundaram
- Department of Anatomy, Sree Balaji Medical College and Hospital, Chennai, Tamil Nadu, India
| | - Gaurav Kumar Ameta
- Department of Computer Engineering, Indus Institute of Technology & Engineering, Indus University, Ahmedabad, Gujarat, India
| | - Ahmed Kareem Oleiwi
- Department of Computer Technical Engineering, The Islamic University, 54001 Najaf, Iraq
| | - Betty Nokobi Dugbakie
- Department of Chemical Engineering, Kwame Nkrumah University of Science and Technology (KNUST), Ghana
| |
Collapse
|
7
|
Kumar Y, Gupta S, Singla R, Hu YC. A Systematic Review of Artificial Intelligence Techniques in Cancer Prediction and Diagnosis. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2021; 29:2043-2070. [PMID: 34602811 PMCID: PMC8475374 DOI: 10.1007/s11831-021-09648-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 09/11/2021] [Indexed: 05/05/2023]
Abstract
Artificial intelligence has aided in the advancement of healthcare research. The availability of open-source healthcare statistics has prompted researchers to create applications that aid cancer detection and prognosis. Deep learning and machine learning models provide a reliable, rapid, and effective solution to deal with such challenging diseases in these circumstances. PRISMA guidelines had been used to select the articles published on the web of science, EBSCO, and EMBASE between 2009 and 2021. In this study, we performed an efficient search and included the research articles that employed AI-based learning approaches for cancer prediction. A total of 185 papers are considered impactful for cancer prediction using conventional machine and deep learning-based classifications. In addition, the survey also deliberated the work done by the different researchers and highlighted the limitations of the existing literature, and performed the comparison using various parameters such as prediction rate, accuracy, sensitivity, specificity, dice score, detection rate, area undercover, precision, recall, and F1-score. Five investigations have been designed, and solutions to those were explored. Although multiple techniques recommended in the literature have achieved great prediction results, still cancer mortality has not been reduced. Thus, more extensive research to deal with the challenges in the area of cancer prediction is required.
Collapse
Affiliation(s)
- Yogesh Kumar
- Department of Computer Engineering, Indus Institute of Technology & Engineering, Indus University, Rancharda, Via: Shilaj, Ahmedabad, Gujarat 382115 India
| | - Surbhi Gupta
- School of Computer Science and Engineering, Model Institute of Engineering and Technology, Kot bhalwal, Jammu, J&K 181122 India
| | - Ruchi Singla
- Department of Research, Innovations, Sponsored Projects and Entrepreneurship, Chandigarh Group of Colleges, Landran, Mohali India
| | - Yu-Chen Hu
- Department of Computer Science and Information Management, Providence University, Taichung City, Taiwan, ROC
| |
Collapse
|