1
|
Patel H, Shah H, Patel G, Patel A. Hematologic cancer diagnosis and classification using machine and deep learning: State-of-the-art techniques and emerging research directives. Artif Intell Med 2024; 152:102883. [PMID: 38657439 DOI: 10.1016/j.artmed.2024.102883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 04/16/2024] [Accepted: 04/18/2024] [Indexed: 04/26/2024]
Abstract
Hematology is the study of diagnosis and treatment options for blood diseases, including cancer. Cancer is considered one of the deadliest diseases across all age categories. Diagnosing such a deadly disease at the initial stage is essential to cure the disease. Hematologists and pathologists rely on microscopic evaluation of blood or bone marrow smear images to diagnose blood-related ailments. The abundance of overlapping cells, cells of varying densities among platelets, non-illumination levels, and the amount of red and white blood cells make it more difficult to diagnose illness using blood cell images. Pathologists are required to put more effort into the traditional, time-consuming system. Nowadays, it becomes possible with machine learning and deep learning techniques, to automate the diagnostic processes, categorize microscopic blood cells, and improve the accuracy of the procedure and its speed as the models developed using these methods may guide an assisting tool. In this article, we have acquired, analyzed, scrutinized, and finally selected around 57 research papers from various machine learning and deep learning methodologies that have been employed in the diagnosis of leukemia and its classification over the past 20 years, which have been published between the years 2003 and 2023 by PubMed, IEEE, Science Direct, Google Scholar and other pertinent sources. Our primary emphasis is on evaluating the advantages and limitations of analogous research endeavors to provide a concise and valuable research directive that can be of significant utility to fellow researchers in the field.
Collapse
Affiliation(s)
- Hema Patel
- Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charotar University of Science and Technology, CHARUSAT, Campus, Changa, 388421 Anand, Gujarat, India.
| | - Himal Shah
- QURE Haematology Centre, Ahmedabad 380006, Gujarat, India
| | - Gayatri Patel
- Ramanbhai Patel College of Pharmacy, Charotar University of Science and Technology, CHARUSAT, Campus, Changa, 388421 Anand, Gujarat, India
| | - Atul Patel
- Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charotar University of Science and Technology, CHARUSAT, Campus, Changa, 388421 Anand, Gujarat, India
| |
Collapse
|
2
|
Usategui I, Arroyo Y, Torres AM, Barbado J, Mateo J. Systemic Lupus Erythematosus: How Machine Learning Can Help Distinguish between Infections and Flares. Bioengineering (Basel) 2024; 11:90. [PMID: 38247967 PMCID: PMC11154352 DOI: 10.3390/bioengineering11010090] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/07/2024] [Accepted: 01/15/2024] [Indexed: 01/23/2024] Open
Abstract
Systemic Lupus Erythematosus (SLE) is a multifaceted autoimmune ailment that impacts multiple bodily systems and manifests with varied clinical manifestations. Early detection is considered the most effective way to save patients' lives, but detecting severe SLE activity in its early stages is proving to be a formidable challenge. Consequently, this work advocates the use of Machine Learning (ML) algorithms for the diagnosis of SLE flares in the context of infections. In the pursuit of this research, the Random Forest (RF) method has been employed due to its performance attributes. With RF, our objective is to uncover patterns within the patient data. Multiple ML techniques have been scrutinized within this investigation. The proposed system exhibited around a 7.49% enhancement in accuracy when compared to k-Nearest Neighbors (KNN) algorithm. In contrast, the Support Vector Machine (SVM), Binary Linear Discriminant Analysis (BLDA), Decision Trees (DT) and Linear Regression (LR) methods demonstrated inferior performance, with respective values around 81%, 78%, 84% and 69%. It is noteworthy that the proposed method displayed a superior area under the curve (AUC) and balanced accuracy (both around 94%) in comparison to other ML approaches. These outcomes underscore the feasibility of crafting an automated diagnostic support method for SLE patients grounded in ML systems.
Collapse
Affiliation(s)
- Iciar Usategui
- Department of Internal Medicine, Hospital Clínico Universitario, 47005 Valladolid, Spain;
| | - Yoel Arroyo
- Department of Technologies and Information Systems, Faculty of Social Sciences and Information Technologies, Universidad de Castilla-La Mancha (UCLM), 45600 Talavera de la Reina, Spain;
| | - Ana María Torres
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha (UCLM), 16071 Cuenca, Spain;
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Julia Barbado
- Department of Internal Medicine, Hospital Universitario Río Hortega, 47012 Valladolid, Spain;
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha (UCLM), 16071 Cuenca, Spain;
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| |
Collapse
|
3
|
Usategui I, Barbado J, Torres AM, Cascón J, Mateo J. Machine learning, a new tool for the detection of immunodeficiency patterns in systemic lupus erythematosus. J Investig Med 2023; 71:742-752. [PMID: 37158077 DOI: 10.1177/10815589231171404] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Systemic lupus erythematosus (SLE) is a complex autoimmune disease that affects several organs and causes variable clinical symptoms. Early diagnosis is currently the most effective way to save the lives of patients with SLE. But it is very difficult to detect in the early stages of the disease. Because of this, this study proposes a machine learning system to help diagnose patients with SLE. To carry out the research, the extreme gradient boosting method has been implemented due to its performance characteristics, as it allows high performance, scalability, accuracy, and low computational load. From this method we try to recognize patterns in the data obtained from patients, which allow the classification of SLE patients with high accuracy and differentiate these patients from controls. Several machine learning methods have been analyzed in this study. The proposed method achieves a higher prediction value of patients who may suffer from SLE than the rest of the compared systems. The proposed algorithm achieved an improvement in accuracy of 4.49% over k-Nearest Neighbors. As for the Support Vector Machine and Gaussian Naive Bayes (GNB) methods, they achieved a lower performance than the proposed one, reaching values of 83% and 81%, respectively. It should be noted that the proposed system showed a higher area under the curve (90%) and a balanced accuracy (90%) than the other machine learning methods. This study shows the usefulness of ML techniques for identifying and predicting SLE patients. These results demonstrate the possibility of developing automatic diagnostic support systems for SLE patients based on machine learning techniques.
Collapse
Affiliation(s)
- Iciar Usategui
- Internal Medicine Department, Hospital Clínico Universitario de Valladolid, Valladolid, Spain
| | - Julia Barbado
- Autoimmune Diseases Unit, Río Hortega University Hospital, Valladolid, Spain
| | - Ana María Torres
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| | - Joaquín Cascón
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| |
Collapse
|
4
|
Mirzaei G. GraphChrom: A Novel Graph-Based Framework for Cancer Classification Using Chromosomal Rearrangement Endpoints. Cancers (Basel) 2022; 14:cancers14133060. [PMID: 35804833 PMCID: PMC9265123 DOI: 10.3390/cancers14133060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 06/06/2022] [Accepted: 06/18/2022] [Indexed: 11/16/2022] Open
Abstract
Chromosomal rearrangements are generally a consequence of improperly repaired double-strand breaks in DNA. These genomic aberrations can be a driver of cancers. Here, we investigated the use of chromosomal rearrangements for classification of cancer tumors and the effect of inter- and intrachromosomal rearrangements in cancer classification. We used data from the Catalogue of Somatic Mutations in Cancer (COSMIC) for breast, pancreatic, and prostate cancers, for which the COSMIC dataset reports the highest number of chromosomal aberrations. We developed a framework known as GraphChrom for cancer classification. GraphChrom was developed using a graph neural network which models the complex structure of chromosomal aberrations (CA) and provides local connectivity between the aberrations. The proposed framework illustrates three important contributions to the field of cancers. Firstly, it successfully classifies cancer types and subtypes. Secondly, it evolved into a novel data extraction technique which can be used to extract more informative graphs (informative aberrations associated with a sample); and thirdly, it predicts that interCAs (rearrangements between two or more chromosomes) are more effective in cancer prediction than intraCAs (rearrangements within the same chromosome), although intraCAs are three times more likely to occur than intraCAs.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- Department of Computer Science and Engineering, Ohio State University, Marion, OH 403302, USA
| |
Collapse
|
5
|
Zhang X, Xiao H, Gao R, Zhang H, Wang Y. K-nearest neighbors rule combining prototype selection and local feature weighting for classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108451] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Sharma P, Balabantaray BK, Bora K, Mallik S, Kasugai K, Zhao Z. An Ensemble-Based Deep Convolutional Neural Network for Computer-Aided Polyps Identification From Colonoscopy. Front Genet 2022; 13:844391. [PMID: 35559018 PMCID: PMC9086187 DOI: 10.3389/fgene.2022.844391] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 03/14/2022] [Indexed: 01/16/2023] Open
Abstract
Colorectal cancer (CRC) is the third leading cause of cancer death globally. Early detection and removal of precancerous polyps can significantly reduce the chance of CRC patient death. Currently, the polyp detection rate mainly depends on the skill and expertise of gastroenterologists. Over time, unidentified polyps can develop into cancer. Machine learning has recently emerged as a powerful method in assisting clinical diagnosis. Several classification models have been proposed to identify polyps, but their performance has not been comparable to an expert endoscopist yet. Here, we propose a multiple classifier consultation strategy to create an effective and powerful classifier for polyp identification. This strategy benefits from recent findings that different classification models can better learn and extract various information within the image. Therefore, our Ensemble classifier can derive a more consequential decision than each individual classifier. The extracted combined information inherits the ResNet's advantage of residual connection, while it also extracts objects when covered by occlusions through depth-wise separable convolution layer of the Xception model. Here, we applied our strategy to still frames extracted from a colonoscopy video. It outperformed other state-of-the-art techniques with a performance measure greater than 95% in each of the algorithm parameters. Our method will help researchers and gastroenterologists develop clinically applicable, computational-guided tools for colonoscopy screening. It may be extended to other clinical diagnoses that rely on image.
Collapse
Affiliation(s)
- Pallabi Sharma
- Department of Computer Science and Engineering, National Institute of Technology Meghalaya, Shillong, India
| | - Bunil Kumar Balabantaray
- Department of Computer Science and Engineering, National Institute of Technology Meghalaya, Shillong, India
| | - Kangkana Bora
- Computer Science and Information Technology, Cotton University, Guwahati, India
| | - Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Kunio Kasugai
- Department of Gastroenterology, Aichi Medical University, Nagakute, Japan
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, United States
| |
Collapse
|
7
|
Hamed A, Tahoun M, Nassar H. KNNHI: Resilient KNN algorithm for heterogeneous incomplete data classification and K identification using rough set theory. J Inf Sci 2022. [DOI: 10.1177/01655515211069539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The original K-nearest neighbour ( KNN) algorithm was meant to classify homogeneous complete data, that is, data with only numerical features whose values exist completely. Thus, it faces problems when used with heterogeneous incomplete (HI) data, which has also categorical features and is plagued with missing values. Many solutions have been proposed over the years but most have pitfalls. For example, some solve heterogeneity by converting categorical features into numerical ones, inflicting structural damage. Others solve incompleteness by imputation or elimination, causing semantic disturbance. Almost all use the same K for all query objects, leading to misclassification. In the present work, we introduce KNNHI, a KNN-based algorithm for HI data classification that avoids all these pitfalls. Leveraging rough set theory, KNNHI preserves both categorical and numerical features, leaves missing values untouched and uses a different K for each query. The end result is an accurate classifier, as demonstrated by extensive experimentation on nine datasets mostly from the University of California Irvine repository, using a 10-fold cross-validation technique. We show that KNNHI outperforms six recently published KNN-based algorithms, in terms of precision, recall, accuracy and F-Score. In addition to its function as a mighty classifier, KNNHI can also serve as a K calculator, helping KNN-based algorithms that use a single K value for all queries that find the best such value. Sure enough, we show how four such algorithms improve their performance using the K obtained by KNNHI. Finally, KNNHI exhibits impressive resilience to the degree of incompleteness, degree of heterogeneity and the metric used to measure distance.
Collapse
|
8
|
Rahmani AM, Azhir E, Naserbakht M, Mohammadi M, Aldalwie AHM, Majeed MK, Taher Karim SH, Hosseinzadeh M. Automatic COVID-19 detection mechanisms and approaches from medical images: a systematic review. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:28779-28798. [PMID: 35382107 PMCID: PMC8970643 DOI: 10.1007/s11042-022-12952-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 05/09/2021] [Accepted: 03/10/2022] [Indexed: 05/04/2023]
Abstract
Since early 2020, Coronavirus Disease 2019 (COVID-19) has spread widely around the world. COVID-19 infects the lungs, leading to breathing difficulties. Early detection of COVID-19 is important for the prevention and treatment of pandemic. Numerous sources of medical images (e.g., Chest X-Rays (CXR), Computed Tomography (CT), and Magnetic Resonance Imaging (MRI)) are regarded as a desirable technique for diagnosing COVID-19 cases. Medical images of coronavirus patients show that the lungs are filled with sticky mucus that prevents them from inhaling. Today, Artificial Intelligence (AI) based algorithms have made a significant shift in the computer aided diagnosis due to their effective feature extraction capabilities. In this survey, a complete and systematic review of the application of Machine Learning (ML) methods for the detection of COVID-19 is presented, focused on works that used medical images. We aimed to evaluate various ML-based techniques in detecting COVID-19 using medical imaging. A total of 26 papers were extracted from ACM, ScienceDirect, Springerlink, Tech Science Press, and IEEExplore. Five different ML categories to review these mechanisms are considered, which are supervised learning-based, deep learning-based, active learning-based, transfer learning-based, and evolutionary learning-based mechanisms. A number of articles are investigated in each group. Also, some directions for further research are discussed to improve the detection of COVID-19 using ML techniques in the future. In most articles, deep learning is used as the ML method. Also, most of the researchers used CXR images to diagnose COVID-19. Most articles reported accuracy of the models to evaluate model performance. The accuracy of the studied models ranged from 0.84 to 0.99. The studies demonstrated the current status of AI techniques in using AI potentials in the fight against COVID-19.
Collapse
Affiliation(s)
- Amir Masoud Rahmani
- Future Technology Research Center, National Yunlin University of Science and Technology, Douliu, Yunlin Taiwan
| | - Elham Azhir
- Research and Development Center, Mobile Telecommunication Company of Iran, Tehran, Iran
| | - Morteza Naserbakht
- Mental Health Research Center, Psychosocial Health Research Institute, Iran University of Medical Sciences, Tehran, Iran
| | - Mokhtar Mohammadi
- Department of Information Technology, College of Engineering and Computer Science, Lebanese French University, Kurdistan Region, Iraq
| | - Adil Hussein Mohammed Aldalwie
- Department of Communication and Computer Engineering, Faculty of Engineering, Cihan University-Erbil, Kurdistan Region, Iraq
| | - Mohammed Kamal Majeed
- Information Technology Department, Faculty of Applied Science, Tishk International University, Erbil, Iraq
| | - Sarkhel H. Taher Karim
- Computer Department, College of Science, University of Halabja, Halabja, Iraq
- Computer Networks Department, Sulaimani Polytechnic University, Technical College of Informatics, Sulaymaniyah, Iraq
| | | |
Collapse
|
9
|
Garnica O, Gómez D, Ramos V, Hidalgo JI, Ruiz-Giardín JM. Diagnosing hospital bacteraemia in the framework of predictive, preventive and personalised medicine using electronic health records and machine learning classifiers. EPMA J 2021; 12:365-381. [PMID: 34484472 PMCID: PMC8405861 DOI: 10.1007/s13167-021-00252-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/30/2021] [Indexed: 12/12/2022]
Abstract
Background The bacteraemia prediction is relevant because sepsis is one of the most important causes of morbidity and mortality. Bacteraemia prognosis primarily depends on a rapid diagnosis. The bacteraemia prediction would shorten up to 6 days the diagnosis, and, in conjunction with individual patient variables, should be considered to start the early administration of personalised antibiotic treatment and medical services, the election of specific diagnostic techniques and the determination of additional treatments, such as surgery, that would prevent subsequent complications. Machine learning techniques could help physicians make these informed decisions by predicting bacteraemia using the data already available in electronic hospital records. Objective This study presents the application of machine learning techniques to these records to predict the blood culture's outcome, which would reduce the lag in starting a personalised antibiotic treatment and the medical costs associated with erroneous treatments due to conservative assumptions about blood culture outcomes. Methods Six supervised classifiers were created using three machine learning techniques, Support Vector Machine, Random Forest and K-Nearest Neighbours, on the electronic health records of hospital patients. The best approach to handle missing data was chosen and, for each machine learning technique, two classification models were created: the first uses the features known at the time of blood extraction, whereas the second uses four extra features revealed during the blood culture. Results The six classifiers were trained and tested using a dataset of 4357 patients with 117 features per patient. The models obtain predictions that, for the best case, are up to a state-of-the-art accuracy of 85.9%, a sensitivity of 87.4% and an AUC of 0.93. Conclusions Our results provide cutting-edge metrics of interest in predictive medical models with values that exceed the medical practice threshold and previous results in the literature using classical modelling techniques in specific types of bacteraemia. Additionally, the consistency of results is reasserted because the three classifiers' importance ranking shows similar features that coincide with those that physicians use in their manual heuristics. Therefore, the efficacy of these machine learning techniques confirms their viability to assist in the aims of predictive and personalised medicine once the disease presents bacteraemia-compatible symptoms and to assist in improving the healthcare economy.
Collapse
Affiliation(s)
- Oscar Garnica
- Departamento de Arquitectura de Computadores, Universidad Complutense de Madrid, Madrid, Spain
| | - Diego Gómez
- Universidad Complutense de Madrid, Madrid, Spain
| | - Víctor Ramos
- Universidad Complutense de Madrid, Madrid, Spain
| | - J. Ignacio Hidalgo
- Departamento de Arquitectura de Computadores, Universidad Complutense de Madrid, Madrid, Spain
| | - José M. Ruiz-Giardín
- Departamento de Medicina Interna, Hospital Universitario de Fuenlabrada, Madrid, Spain
| |
Collapse
|