1
|
Hologram Noise Model for Data Augmentation and Deep Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:948. [PMID: 38339665 PMCID: PMC10857140 DOI: 10.3390/s24030948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 01/23/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024]
Abstract
This paper introduces a noise augmentation technique designed to enhance the robustness of state-of-the-art (SOTA) deep learning models against degraded image quality, a common challenge in long-term recording systems. Our method, demonstrated through the classification of digital holographic images, utilizes a novel approach to synthesize and apply random colored noise, addressing the typically encountered correlated noise patterns in such images. Empirical results show that our technique not only maintains classification accuracy in high-quality images but also significantly improves it when given noisy inputs without increasing the training time. This advancement demonstrates the potential of our approach for augmenting data for deep learning models to perform effectively in production under varied and suboptimal conditions.
Collapse
|
2
|
YOLOv5-MS: Real-Time Multi-Surveillance Pedestrian Target Detection Model for Smart Cities. Biomimetics (Basel) 2023; 8:480. [PMID: 37887611 PMCID: PMC10604626 DOI: 10.3390/biomimetics8060480] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/26/2023] [Accepted: 10/04/2023] [Indexed: 10/28/2023] Open
Abstract
Intelligent video surveillance plays a pivotal role in enhancing the infrastructure of smart urban environments. The seamless integration of multi-angled cameras, functioning as perceptive sensors, significantly enhances pedestrian detection and augments security measures in smart cities. Nevertheless, current pedestrian-focused target detection encounters challenges such as slow detection speeds and increased costs. To address these challenges, we introduce the YOLOv5-MS model, an YOLOv5-based solution for target detection. Initially, we optimize the multi-threaded acquisition of video streams within YOLOv5 to ensure image stability and real-time performance. Subsequently, leveraging reparameterization, we replace the original BackBone convolution with RepvggBlock, streamlining the model by reducing convolutional layer channels, thereby enhancing the inference speed. Additionally, the incorporation of a bioinspired "squeeze and excitation" module in the convolutional neural network significantly enhances the detection accuracy. This module improves target focusing and diminishes the influence of irrelevant elements. Furthermore, the integration of the K-means algorithm and bioinspired Retinex image augmentation during training effectively enhances the model's detection efficacy. Finally, loss computation adopts the Focal-EIOU approach. The empirical findings from our internally developed smart city dataset unveil YOLOv5-MS's impressive 96.5% mAP value, indicating a significant 2.0% advancement over YOLOv5s. Moreover, the average inference speed demonstrates a notable 21.3% increase. These data decisively substantiate the model's superiority, showcasing its capacity to effectively perform pedestrian detection within an Intranet of over 50 video surveillance cameras, in harmony with our stringent requisites.
Collapse
|
3
|
YOLOv5s-g nConv: detecting personal protective equipment for workers at height. Front Public Health 2023; 11:1225478. [PMID: 37841722 PMCID: PMC10569216 DOI: 10.3389/fpubh.2023.1225478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 09/05/2023] [Indexed: 10/17/2023] Open
Abstract
Introduction Falls from height (FFH) accidents can devastate families and individuals. Currently, the best way to prevent falls from heights is to wear personal protective equipment (PPE). However, traditional manual checking methods for safety hazards are inefficient and difficult to detect and eliminate potential risks. Methods To better detect whether a person working at height is wearing PPE or not, this paper first applies field research and Python crawling techniques to create a dataset of people working at height, extends the dataset to 10,000 images through data enhancement (brightness, rotation, blurring, and Moica), and categorizes the dataset into a training set, a validation set, and a test set according to the ratio of 7:2:1. In this study, three improved YOLOv5s models are proposed for detecting PPE in construction sites with many open-air operations, complex construction scenarios, and frequent personnel changes. Among them, YOLOv5s-gnconv is wholly based on the convolutional structure, which achieves effective modeling of higher-order spatial interactions through gated convolution (gnConv) and cyclic design, improves the performance of the algorithm, and increases the expressiveness of the model while reducing the network parameters. Results Experimental results show that YOLOv5s-gnconv outperforms the official model YOLOv5s by 5.01%, 4.72%, and 4.26% in precision, recall, and mAP_0.5, respectively. It better ensures the safety of workers working at height. Discussion To deploy the YOLOv5s-gnConv model in a construction site environment and to effectively monitor and manage the safety of workers at height, we also discuss the impacts and potential limitations of lighting conditions, camera angles, and worker movement patterns.
Collapse
|
4
|
Inside out: transforming images of lab-grown plants for machine learning applications in agriculture. Front Artif Intell 2023; 6:1200977. [PMID: 37483870 PMCID: PMC10358354 DOI: 10.3389/frai.2023.1200977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/05/2023] [Indexed: 07/25/2023] Open
Abstract
Introduction Machine learning tasks often require a significant amount of training data for the resultant network to perform suitably for a given problem in any domain. In agriculture, dataset sizes are further limited by phenotypical differences between two plants of the same genotype, often as a result of different growing conditions. Synthetically-augmented datasets have shown promise in improving existing models when real data is not available. Methods In this paper, we employ a contrastive unpaired translation (CUT) generative adversarial network (GAN) and simple image processing techniques to translate indoor plant images to appear as field images. While we train our network to translate an image containing only a single plant, we show that our method is easily extendable to produce multiple-plant field images. Results Furthermore, we use our synthetic multi-plant images to train several YoloV5 nano object detection models to perform the task of plant detection and measure the accuracy of the model on real field data images. Discussion The inclusion of training data generated by the CUT-GAN leads to better plant detection performance compared to a network trained solely on real data.
Collapse
|
5
|
Animal Species Recognition with Deep Convolutional Neural Networks from Ecological Camera Trap Images. Animals (Basel) 2023; 13:ani13091526. [PMID: 37174563 PMCID: PMC10177479 DOI: 10.3390/ani13091526] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/16/2023] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Accurate identification of animal species is necessary to understand biodiversity richness, monitor endangered species, and study the impact of climate change on species distribution within a specific region. Camera traps represent a passive monitoring technique that generates millions of ecological images. The vast numbers of images drive automated ecological analysis as essential, given that manual assessment of large datasets is laborious, time-consuming, and expensive. Deep learning networks have been advanced in the last few years to solve object and species identification tasks in the computer vision domain, providing state-of-the-art results. In our work, we trained and tested machine learning models to classify three animal groups (snakes, lizards, and toads) from camera trap images. We experimented with two pretrained models, VGG16 and ResNet50, and a self-trained convolutional neural network (CNN-1) with varying CNN layers and augmentation parameters. For multiclassification, CNN-1 achieved 72% accuracy, whereas VGG16 reached 87%, and ResNet50 attained 86% accuracy. These results demonstrate that the transfer learning approach outperforms the self-trained model performance. The models showed promising results in identifying species, especially those with challenging body sizes and vegetation.
Collapse
|
6
|
A Multiscale Polyp Detection Approach for GI Tract Images Based on Improved DenseNet and Single-Shot Multibox Detector. Diagnostics (Basel) 2023; 13:diagnostics13040733. [PMID: 36832221 PMCID: PMC9955440 DOI: 10.3390/diagnostics13040733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/07/2023] [Accepted: 02/09/2023] [Indexed: 02/17/2023] Open
Abstract
Small bowel polyps exhibit variations related to color, shape, morphology, texture, and size, as well as to the presence of artifacts, irregular polyp borders, and the low illumination condition inside the gastrointestinal GI tract. Recently, researchers developed many highly accurate polyp detection models based on one-stage or two-stage object detector algorithms for wireless capsule endoscopy (WCE) and colonoscopy images. However, their implementation requires a high computational power and memory resources, thus sacrificing speed for an improvement in precision. Although the single-shot multibox detector (SSD) proves its effectiveness in many medical imaging applications, its weak detection ability for small polyp regions persists due to the lack of information complementary between features of low- and high-level layers. The aim is to consecutively reuse feature maps between layers of the original SSD network. In this paper, we propose an innovative SSD model based on a redesigned version of a dense convolutional network (DenseNet) which emphasizes multiscale pyramidal feature maps interdependence called DC-SSDNet (densely connected single-shot multibox detector). The original backbone network VGG-16 of the SSD is replaced with a modified version of DenseNet. The DenseNet-46 front stem is improved to extract highly typical characteristics and contextual information, which improves the model's feature extraction ability. The DC-SSDNet architecture compresses unnecessary convolution layers of each dense block to reduce the CNN model complexity. Experimental results showed a remarkable improvement in the proposed DC-SSDNet to detect small polyp regions achieving an mAP of 93.96%, F1-score of 90.7%, and requiring less computational time.
Collapse
|
7
|
Intraclass Image Augmentation for Defect Detection Using Generative Adversarial Neural Networks. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23041861. [PMID: 36850460 PMCID: PMC9967620 DOI: 10.3390/s23041861] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 01/19/2023] [Accepted: 02/03/2023] [Indexed: 05/27/2023]
Abstract
Surface defect identification based on computer vision algorithms often leads to inadequate generalization ability due to large intraclass variation. Diversity in lighting conditions, noise components, defect size, shape, and position make the problem challenging. To solve the problem, this paper develops a pixel-level image augmentation method that is based on image-to-image translation with generative adversarial neural networks (GANs) conditioned on fine-grained labels. The GAN model proposed in this work, referred to as Magna-Defect-GAN, is capable of taking control of the image generation process and producing image samples that are highly realistic in terms of variations. Firstly, the surface defect dataset based on the magnetic particle inspection (MPI) method is acquired in a controlled environment. Then, the Magna-Defect-GAN model is trained, and new synthetic image samples with large intraclass variations are generated. These synthetic image samples artificially inflate the training dataset size in terms of intraclass diversity. Finally, the enlarged dataset is used to train a defect identification model. Experimental results demonstrate that the Magna-Defect-GAN model can generate realistic and high-resolution surface defect images up to the resolution of 512 × 512 in a controlled manner. We also show that this augmentation method can boost accuracy and be easily adapted to any other surface defect identification models.
Collapse
|
8
|
Anomaly Detection of GAN Industrial Image Based on Attention Feature Fusion. SENSORS (BASEL, SWITZERLAND) 2022; 23:355. [PMID: 36616953 PMCID: PMC9824468 DOI: 10.3390/s23010355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 12/25/2022] [Accepted: 12/26/2022] [Indexed: 06/17/2023]
Abstract
As life becomes richer day by day, the requirement for quality industrial products is becoming greater and greater. Therefore, image anomaly detection on industrial products is of significant importance and has become a research hotspot. Industrial manufacturers are also gradually intellectualizing how product parts may have flaws and defects, and that industrial product image anomalies have characteristics such as category diversity, sample scarcity, and the uncertainty of change; thus, a higher requirement for image anomaly detection has arisen. For this reason, we proposed a method of industrial image anomaly detection that applies a generative adversarial network based on attention feature fusion. For the purpose of capturing richer image channel features, we added attention feature fusion based on an encoder and decoder, and through skip-connection, this performs the feature fusion for the encode and decode vectors in the same dimension. During training, we used random cut-paste image augmentation, which improved the diversity of the datasets. We displayed the results of a wide experiment, which was based on the public industrial detection MVTec dataset. The experiment illustrated that the method we proposed has a higher level AUC and the overall result was increased by 4.1%. Finally, we realized the pixel level anomaly localization of the industrial dataset, which illustrates the feasibility and effectiveness of this method.
Collapse
|
9
|
Influence of contrast and texture based image modifications on the performance and attention shift of U-Net models for brain tissue segmentation. FRONTIERS IN NEUROIMAGING 2022; 1:1012639. [PMID: 37555149 PMCID: PMC10406260 DOI: 10.3389/fnimg.2022.1012639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 10/12/2022] [Indexed: 08/10/2023]
Abstract
Contrast and texture modifications applied during training or test-time have recently shown promising results to enhance the generalization performance of deep learning segmentation methods in medical image analysis. However, a deeper understanding of this phenomenon has not been investigated. In this study, we investigated this phenomenon using a controlled experimental setting, using datasets from the Human Connectome Project and a large set of simulated MR protocols, in order to mitigate data confounders and investigate possible explanations as to why model performance changes when applying different levels of contrast and texture-based modifications. Our experiments confirm previous findings regarding the improved performance of models subjected to contrast and texture modifications employed during training and/or testing time, but further show the interplay when these operations are combined, as well as the regimes of model improvement/worsening across scanning parameters. Furthermore, our findings demonstrate a spatial attention shift phenomenon of trained models, occurring for different levels of model performance, and varying in relation to the type of applied image modification.
Collapse
|
10
|
Study on Accuracy Improvement of Slope Failure Region Detection Using Mask R-CNN with Augmentation Method. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22176412. [PMID: 36080871 PMCID: PMC9460332 DOI: 10.3390/s22176412] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 08/17/2022] [Accepted: 08/23/2022] [Indexed: 05/27/2023]
Abstract
We proposed an automatic detection method of slope failure regions using a semantic segmentation method called Mask R-CNN based on a deep learning algorithm to improve the efficiency of damage assessment in the event of slope failure disaster. There is limited research on detecting landslides by deep learning, and the lack of training data is an important issue to be resolved, as aerial photographs are not taken with sufficient frequency during a disaster. This study attempts to use CutMix-based augmentation to improve detection accuracy. We also compare the detection results obtained by augmentation of multiple patterns. In the comparison of the not augmented data case, the recall increased by 0.186 in the case using the augmented data with the shape of the slope failure region maintained. When the image data was augmented while maintaining the shape of the slope failure region, the recall score indicated the low oversights in the prediction result is 0.701. This is an increase of 0.186 compared to the case where no augmentation was performed. In addition, the F1 score was 0.740, this also increased by 0.139, and high values were obtained for other indicators. Therefore, the method proposed in this study is greatly useful for grasping slope failure regions because of the detection with high accuracy, as described above.
Collapse
|
11
|
Automatic Detection of Cracks on Concrete Surfaces in the Presence of Shadows. SENSORS 2022; 22:s22103662. [PMID: 35632070 PMCID: PMC9145296 DOI: 10.3390/s22103662] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 05/04/2022] [Accepted: 05/09/2022] [Indexed: 11/18/2022]
Abstract
Deep learning-based methods, especially convolutional neural networks, have been developed to automatically process the images of concrete surfaces for crack identification tasks. Although deep learning-based methods claim very high accuracy, they often ignore the complexity of the image collection process. Real-world images are often impacted by complex illumination conditions, shadows, the randomness of crack shapes and sizes, blemishes, and concrete spall. Published literature and available shadow databases are oriented towards images taken in laboratory conditions. In this paper, we explore the complexity of image classification for concrete crack detection in the presence of demanding illumination conditions. Challenges associated with the application of deep learning-based methods for detecting concrete cracks in the presence of shadows are elaborated on in this paper. Novel shadow augmentation techniques are developed to increase the accuracy of automatic detection of concrete cracks.
Collapse
|
12
|
Occlusion Robust Wheat Ear Counting Algorithm Based on Deep Learning. FRONTIERS IN PLANT SCIENCE 2021; 12:645899. [PMID: 34177976 PMCID: PMC8226325 DOI: 10.3389/fpls.2021.645899] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Accepted: 05/19/2021] [Indexed: 05/17/2023]
Abstract
Counting the number of wheat ears in images under natural light is an important way to evaluate the crop yield, thus, it is of great significance to modern intelligent agriculture. However, the distribution of wheat ears is dense, so the occlusion and overlap problem appears in almost every wheat image. It is difficult for traditional image processing methods to solve occlusion problem due to the deficiency of high-level semantic features, while existing deep learning based counting methods did not solve the occlusion efficiently. This article proposes an improved EfficientDet-D0 object detection model for wheat ear counting, and focuses on solving occlusion. First, the transfer learning method is employed in the pre-training of the model backbone network to extract the high-level semantic features of wheat ears. Secondly, an image augmentation method Random-Cutout is proposed, in which some rectangles are selected and erased according to the number and size of the wheat ears in the images to simulate occlusion in real wheat images. Finally, convolutional block attention module (CBAM) is adopted into the EfficientDet-D0 model after the backbone, which makes the model refine the features, pay more attention to the wheat ears and suppress other useless background information. Extensive experiments are done by feeding the features to detection layer, showing that the counting accuracy of the improved EfficientDet-D0 model reaches 94%, which is about 2% higher than the original model, and false detection rate is 5.8%, which is the lowest among comparative methods.
Collapse
|
13
|
Ophthalmologist-Level Classification of Fundus Disease With Deep Neural Networks. Transl Vis Sci Technol 2020; 9:39. [PMID: 32855843 PMCID: PMC7424930 DOI: 10.1167/tvst.9.2.39] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 05/31/2020] [Indexed: 01/03/2023] Open
Abstract
Purpose To implement the classification of fundus diseases using deep convolutional neural networks (CNN), which is trained end-to-end from fundus images directly, the only input are pixels and disease labels, and the output is a probability distribution of a fundus image belonging to 18 fundus diseases. Methods Automated classification of fundus diseases using images is a challenging task owing to the fine-grained variability in the appearance of fundus lesions. Deep CNNs show potential for general and highly variable tasks across many fine-grained object categories. Deep CNNs need large amounts of labeled samples, yet the available fundus images, especially labeled samples, are limited, which cannot satisfy the training requirement. So image augmentations such as rotation, scaling, and noising are implemented to enlarge the training dataset. We fine-tune the ResNet CNN architecture with 120,100 fundus images consisting of 18 different diseases and use it to classify the fundus images into corresponding diseases. Results The performance is tested against two board-certified ophthalmologists. The CNN achieves performance on par with the experts for the classification accuracy. Conclusions Deep CNN is capable of predicting fundus diseases given fundus images as input, which can enhance the efficiency of diagnosis process and promote better visual outcomes. Outfitted with deep neural networks, mobile devices can potentially extend the reach of ophthalmologists outside of the clinic and provide low-cost universal access to vital diagnostic care. Translational Relevance This article implemented automatic prediction of fundus diseases that was done by ophthalmologists previously.
Collapse
|
14
|
Grape Leaf Disease Identification Using Improved Deep Convolutional Neural Networks. FRONTIERS IN PLANT SCIENCE 2020; 11:1082. [PMID: 32760419 PMCID: PMC7373759 DOI: 10.3389/fpls.2020.01082] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 06/30/2020] [Indexed: 05/18/2023]
Abstract
Anthracnose, brown spot, mites, black rot, downy mildew, and leaf blight are six common grape leaf pests and diseases, which cause severe economic losses to the grape industry. Timely diagnosis and accurate identification of grape leaf diseases are decisive for controlling the spread of disease and ensuring the healthy development of the grape industry. This paper proposes a novel recognition approach that is based on improved convolutional neural networks for the diagnoses of grape leaf diseases. First, based on 4,023 images collected in the field and 3,646 images collected from public data sets, a data set of 107,366 grape leaf images is generated via image enhancement techniques. Afterward, Inception structure is applied for strengthening the performance of multi-dimensional feature extraction. In addition, a dense connectivity strategy is introduced to encourage feature reuse and strengthen feature propagation. Ultimately, a novel CNN-based model, namely, DICNN, is built and trained from scratch. It realizes an overall accuracy of 97.22% under the hold-out test set. Compared to GoogLeNet and ResNet-34, the recognition accuracy increases by 2.97% and 2.55%, respectively. The experimental results demonstrate that the proposed model can efficiently recognize grape leaf diseases. Meanwhile, this study explores a new approach for the rapid and accurate diagnosis of plant diseases that establishes a theoretical foundation for the application of deep learning in the field of agricultural information.
Collapse
|
15
|
Breast cancer detection using synthetic mammograms from generative adversarial networks in convolutional neural networks. J Med Imaging (Bellingham) 2019; 6:031411. [PMID: 30915386 PMCID: PMC6430964 DOI: 10.1117/1.jmi.6.3.031411] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 02/22/2019] [Indexed: 01/22/2023] Open
Abstract
The convolutional neural network (CNN) is a promising technique to detect breast cancer based on mammograms. Training the CNN from scratch, however, requires a large amount of labeled data. Such a requirement usually is infeasible for some kinds of medical image data such as mammographic tumor images. Because improvement of the performance of a CNN classifier requires more training data, the creation of new training images, image augmentation, is one solution to this problem. We applied the generative adversarial network (GAN) to generate synthetic mammographic images from the digital database for screening mammography (DDSM). From the DDSM, we cropped two sets of regions of interest (ROIs) from the images: normal and abnormal (cancer/tumor). Those ROIs were used to train the GAN, and the GAN then generated synthetic images. For comparison with the affine transformation augmentation methods, such as rotation, shifting, scaling, etc., we used six groups of ROIs [three simple groups: affine augmented, GAN synthetic, real (original), and three mixture groups of any two of the three simple groups] for each to train a CNN classifier from scratch. And, we used real ROIs that were not used in training to validate classification outcomes. Our results show that, to classify the normal ROIs and abnormal ROIs from DDSM, adding GAN-generated ROIs in the training data can help the classifier prevent overfitting, and on validation accuracy, the GAN performs about 3.6% better than affine transformations for image augmentation. Therefore, GAN could be an ideal augmentation approach. The images augmented by GAN or affine transformation cannot substitute for real images to train CNN classifiers because the absence of real images in the training set will cause over-fitting.
Collapse
|
16
|
Breast cancer detection using synthetic mammograms from generative adversarial networks in convolutional neural networks. J Med Imaging (Bellingham) 2019. [PMID: 30915386 DOI: 10.1117/1.jmi.6.3.031411.full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023] Open
Abstract
The convolutional neural network (CNN) is a promising technique to detect breast cancer based on mammograms. Training the CNN from scratch, however, requires a large amount of labeled data. Such a requirement usually is infeasible for some kinds of medical image data such as mammographic tumor images. Because improvement of the performance of a CNN classifier requires more training data, the creation of new training images, image augmentation, is one solution to this problem. We applied the generative adversarial network (GAN) to generate synthetic mammographic images from the digital database for screening mammography (DDSM). From the DDSM, we cropped two sets of regions of interest (ROIs) from the images: normal and abnormal (cancer/tumor). Those ROIs were used to train the GAN, and the GAN then generated synthetic images. For comparison with the affine transformation augmentation methods, such as rotation, shifting, scaling, etc., we used six groups of ROIs [three simple groups: affine augmented, GAN synthetic, real (original), and three mixture groups of any two of the three simple groups] for each to train a CNN classifier from scratch. And, we used real ROIs that were not used in training to validate classification outcomes. Our results show that, to classify the normal ROIs and abnormal ROIs from DDSM, adding GAN-generated ROIs in the training data can help the classifier prevent overfitting, and on validation accuracy, the GAN performs about 3.6% better than affine transformations for image augmentation. Therefore, GAN could be an ideal augmentation approach. The images augmented by GAN or affine transformation cannot substitute for real images to train CNN classifiers because the absence of real images in the training set will cause over-fitting.
Collapse
|