1
|
Wenderoth L, Asemissen AM, Modemann F, Nielsen M, Werner R. Transferable automatic hematological cell classification: Overcoming data limitations with self-supervised learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108560. [PMID: 39693791 DOI: 10.1016/j.cmpb.2024.108560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 12/01/2024] [Accepted: 12/07/2024] [Indexed: 12/20/2024]
Abstract
BACKGROUND AND OBJECTIVE Classification of peripheral blood and bone marrow cells is critical in the diagnosis and monitoring of hematological disorders. The development of robust and reliable automatic classification systems is hampered by data scarcity and limited model generalizability across laboratories. The present study proposes the integration of self-supervised learning (SSL) into cell classification pipelines to address these challenges. METHODS The experiments are based on four public hematological single cell image datasets: one bone marrow and three peripheral blood datasets. The cell classification pipeline consists of two parts: (1) SSL-based image feature extraction without the use of image annotations, and (2) a lightweight machine learning classifier applied to the SSL features and trained on only a small number of annotated images. RESULTS Direct transfer of SSL models trained on bone marrow data to peripheral blood data resulted in higher balanced classification accuracy than the transfer of supervised deep learning counterparts for all blood datasets. After adaptation of the lightweight machine learning classifier with 50 labeled samples per class of the new dataset, the SSL pipeline surpasses supervised deep learning classification performance for one dataset and classes with rare or atypical cell types and performs similarly on the other datasets. CONCLUSIONS The results demonstrate that SSL enables (1) extraction of meaningful cell image features without the use of cell class information; (2) efficient transfer of knowledge between bone marrow and peripheral blood cell domains; and (3) efficient model adaptation to new datasets using only a few labeled data samples.
Collapse
Affiliation(s)
- Laura Wenderoth
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Christoph-Probst-Weg 1, 20251 Hamburg, Germany; Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany; Center for Biomedical Artificial Intelligence (bAIome), University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Anne-Marie Asemissen
- II. Department of Medicine, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Franziska Modemann
- II. Department of Medicine, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Maximilian Nielsen
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Christoph-Probst-Weg 1, 20251 Hamburg, Germany; Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany; Center for Biomedical Artificial Intelligence (bAIome), University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - René Werner
- Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Christoph-Probst-Weg 1, 20251 Hamburg, Germany; Institute of Computational Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany; Center for Biomedical Artificial Intelligence (bAIome), University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany.
| |
Collapse
|
2
|
Chossegros M, Delhommeau F, Stockholm D, Tannier X. Improving the generalizability of white blood cell classification with few-shot domain adaptation. J Pathol Inform 2024; 15:100405. [PMID: 39687668 PMCID: PMC11648780 DOI: 10.1016/j.jpi.2024.100405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 09/27/2024] [Accepted: 10/30/2024] [Indexed: 12/18/2024] Open
Abstract
The morphological classification of nucleated blood cells is fundamental for the diagnosis of hematological diseases. Many Deep Learning algorithms have been implemented to automatize this classification task, but most of the time they fail to classify images coming from different sources. This is known as "domain shift". Whereas some research has been conducted in this area, domain adaptation techniques are often computationally expensive and can introduce significant modifications to initial cell images. In this article, we propose an easy-to-implement workflow where we trained a model to classify images from two datasets, and tested it on images coming from eight other datasets. An EfficientNet model was trained on a source dataset comprising images from two different datasets. It was afterwards fine-tuned on each of the eight target datasets by using 100 or less-annotated images from these datasets. Images from both the source and the target dataset underwent a color transform to put them into a standardized color style. The importance of color transform and fine-tuning was evaluated through an ablation study and visually assessed with scatter plots, and an extensive error analysis was carried out. The model achieved an accuracy higher than 80% for every dataset and exceeded 90% for more than half of the datasets. The presented workflow yielded promising results in terms of generalizability, significantly improving performance on target datasets, whereas keeping low computational cost and maintaining consistent color transformations. Source code is available at: https://github.com/mc2295/WBC_Generalization.
Collapse
Affiliation(s)
- Manon Chossegros
- Sorbonne Université, Inserm, Universite Sorbonne Paris-Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, 15 Rue de l'École de Médecine, 75006 Paris, France
| | - François Delhommeau
- Sorbonne Université, Inserm, Centre de Recherche Saint-Antoine, CRSA, Paris 27 rue de Chaligny, 75012 Paris, France
| | - Daniel Stockholm
- Sorbonne Université, Inserm, Centre de Recherche Saint-Antoine, CRSA, Paris 27 rue de Chaligny, 75012 Paris, France
- PSL Research University, EPHE, Paris 4-14 Rue Ferrus, 75014 Paris, France
| | - Xavier Tannier
- Sorbonne Université, Inserm, Universite Sorbonne Paris-Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé, LIMICS, 15 Rue de l'École de Médecine, 75006 Paris, France
| |
Collapse
|
3
|
Olayah F, Senan EM, Ahmed IA, Awaji B. Blood Slide Image Analysis to Classify WBC Types for Prediction Haematology Based on a Hybrid Model of CNN and Handcrafted Features. Diagnostics (Basel) 2023; 13:diagnostics13111899. [PMID: 37296753 DOI: 10.3390/diagnostics13111899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023] Open
Abstract
White blood cells (WBCs) are one of the main components of blood produced by the bone marrow. WBCs are part of the immune system that protects the body from infectious diseases and an increase or decrease in the amount of any type that causes a particular disease. Thus, recognizing the WBC types is essential for diagnosing the patient's health and identifying the disease. Analyzing blood samples to determine the amount and WBC types requires experienced doctors. Artificial intelligence techniques were applied to analyze blood samples and classify their types to help doctors distinguish between types of infectious diseases due to increased or decreased WBC amounts. This study developed strategies for analyzing blood slide images to classify WBC types. The first strategy is to classify WBC types by the SVM-CNN technique. The second strategy for classifying WBC types is by SVM based on hybrid CNN features, which are called VGG19-ResNet101-SVM, ResNet101-MobileNet-SVM, and VGG19-ResNet101-MobileNet-SVM techniques. The third strategy for classifying WBC types by FFNN is based on a hybrid model of CNN and handcrafted features. With MobileNet and handcrafted features, FFNN achieved an AUC of 99.43%, accuracy of 99.80%, precision of 99.75%, specificity of 99.75%, and sensitivity of 99.68%.
Collapse
Affiliation(s)
- Fekry Olayah
- Department of Information System, Faculty Computer Science and information System, Najran University, Najran 66462, Saudi Arabia
| | - Ebrahim Mohammed Senan
- Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Alrazi University, Sana'a, Yemen
| | | | - Bakri Awaji
- Department of Computer Science, Faculty of Computer Science and Information System, Najran University, Najran 66462, Saudi Arabia
| |
Collapse
|
4
|
Li M, Lin C, Ge P, Li L, Song S, Zhang H, Lu L, Liu X, Zheng F, Zhang S, Sun X. A deep learning model for detection of leukocytes under various interference factors. Sci Rep 2023; 13:2160. [PMID: 36750590 PMCID: PMC9905612 DOI: 10.1038/s41598-023-29331-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 02/02/2023] [Indexed: 02/09/2023] Open
Abstract
The accurate detection of leukocytes is the basis for the diagnosis of blood system diseases. However, diagnosing leukocyte disorders by doctors is time-consuming and requires extensive experience. Automated detection methods with high accuracy can improve detection efficiency and provide recommendations to inexperienced doctors. Current methods and instruments either fail to automate the identification process fully or have low performance and need suitable leukocyte data sets for further study. To improve the current status, we need to develop more intelligent strategies. This paper investigates fulfilling high-performance automatic detection for leukocytes using a deep learning-based method. We established a new dataset more suitable for leukocyte detection, containing 6273 images (8595 leukocytes) and considering nine common clinical interference factors. Based on the dataset, the performance evaluation of six mainstream detection models is carried out, and a more robust ensemble model is proposed. The mean of average precision (mAP) @IoU = 0.50:0.95 and mean of average recall (mAR)@IoU = 0.50:0.95 of the ensemble model on the test set are 0.853 and 0.922, respectively. The detection performance of poor-quality images is robust. For the first time, it is found that the ensemble model yields an accuracy of 98.84% for detecting incomplete leukocytes. In addition, we also compared the test results of different models and found multiple identical false detections of the models, then provided correct suggestions for the clinic.
Collapse
Affiliation(s)
- Meiyu Li
- Tianjin Cancer Hospital Airport Hospital, National Clinical Research Center for Cancer, Tianjin, China
| | - Cong Lin
- School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai, China
| | - Peng Ge
- Tianjin Cancer Hospital Airport Hospital, National Clinical Research Center for Cancer, Tianjin, China
| | - Lei Li
- Clinical Laboratory, Tianjin Chest Hospital, Tianjin, China
| | - Shuang Song
- Tianjin Cancer Hospital Airport Hospital, National Clinical Research Center for Cancer, Tianjin, China
| | - Hanshan Zhang
- The Australian National University, Canberra, Australia
| | - Lu Lu
- Institute of Disaster Medicine, Tianjin University, Tianjin, China
| | - Xiaoxiang Liu
- School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai, China
| | - Fang Zheng
- School of Medical Laboratory, Tianjin Medical University, Tianjin, China
| | - Shijie Zhang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.
| | - Xuguo Sun
- School of Medical Laboratory, Tianjin Medical University, Tianjin, China.
| |
Collapse
|
5
|
Ha Y, Du Z, Tian J. Fine-grained interactive attention learning for semi-supervised white blood cell classification. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
6
|
New segmentation and feature extraction algorithm for classification of white blood cells in peripheral smear images. Sci Rep 2021; 11:19428. [PMID: 34593873 PMCID: PMC8484470 DOI: 10.1038/s41598-021-98599-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 09/13/2021] [Indexed: 01/19/2023] Open
Abstract
This article addresses a new method for the classification of white blood cells (WBCs) using image processing techniques and machine learning methods. The proposed method consists of three steps: detecting the nucleus and cytoplasm, extracting features, and classification. At first, a new algorithm is designed to segment the nucleus. For the cytoplasm to be detected, only a part of it located inside the convex hull of the nucleus is involved in the process. This attitude helps us overcome the difficulties of segmenting the cytoplasm. In the second phase, three shapes and four novel color features are devised and extracted. Finally, by using an SVM model, the WBCs are classified. The segmentation algorithm can detect the nucleus with a dice similarity coefficient of 0.9675. The proposed method can categorize WBCs in Raabin-WBC, LISC, and BCCD datasets with accuracies of 94.65%, 92.21%, and 94.20%, respectively. Besides, we show that the proposed method possesses more generalization power than pre-trained CNN models. It is worth mentioning that the hyperparameters of the classifier are fixed only with the Raabin-WBC dataset, and these parameters are not readjusted for LISC and BCCD datasets.
Collapse
|