1
|
Bhalla D, Rangarajan K, Chandra T, Banerjee S, Arora C. Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature. Indian J Radiol Imaging 2024; 34:469-487. [PMID: 38912238 PMCID: PMC11188703 DOI: 10.1055/s-0043-1775737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024] Open
Abstract
Background Although abundant literature is currently available on the use of deep learning for breast cancer detection in mammography, the quality of such literature is widely variable. Purpose To evaluate published literature on breast cancer detection in mammography for reproducibility and to ascertain best practices for model design. Methods The PubMed and Scopus databases were searched to identify records that described the use of deep learning to detect lesions or classify images into cancer or noncancer. A modification of Quality Assessment of Diagnostic Accuracy Studies (mQUADAS-2) tool was developed for this review and was applied to the included studies. Results of reported studies (area under curve [AUC] of receiver operator curve [ROC] curve, sensitivity, specificity) were recorded. Results A total of 12,123 records were screened, of which 107 fit the inclusion criteria. Training and test datasets, key idea behind model architecture, and results were recorded for these studies. Based on mQUADAS-2 assessment, 103 studies had high risk of bias due to nonrepresentative patient selection. Four studies were of adequate quality, of which three trained their own model, and one used a commercial network. Ensemble models were used in two of these. Common strategies used for model training included patch classifiers, image classification networks (ResNet in 67%), and object detection networks (RetinaNet in 67%). The highest reported AUC was 0.927 ± 0.008 on a screening dataset, while it reached 0.945 (0.919-0.968) on an enriched subset. Higher values of AUC (0.955) and specificity (98.5%) were reached when combined radiologist and Artificial Intelligence readings were used than either of them alone. None of the studies provided explainability beyond localization accuracy. None of the studies have studied interaction between AI and radiologist in a real world setting. Conclusion While deep learning holds much promise in mammography interpretation, evaluation in a reproducible clinical setting and explainable networks are the need of the hour.
Collapse
Affiliation(s)
- Deeksha Bhalla
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Krithika Rangarajan
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Tany Chandra
- Department of Radiodiagnosis, All India Institute of Medical Sciences, New Delhi, India
| | - Subhashis Banerjee
- Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India
| | - Chetan Arora
- Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi, India
| |
Collapse
|
2
|
Guo Y, Zhang H, Yuan L, Chen W, Zhao H, Yu QQ, Shi W. Machine learning and new insights for breast cancer diagnosis. J Int Med Res 2024; 52:3000605241237867. [PMID: 38663911 PMCID: PMC11047257 DOI: 10.1177/03000605241237867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 02/21/2024] [Indexed: 04/28/2024] Open
Abstract
Breast cancer (BC) is the most prominent form of cancer among females all over the world. The current methods of BC detection include X-ray mammography, ultrasound, computed tomography, magnetic resonance imaging, positron emission tomography and breast thermographic techniques. More recently, machine learning (ML) tools have been increasingly employed in diagnostic medicine for its high efficiency in detection and intervention. The subsequent imaging features and mathematical analyses can then be used to generate ML models, which stratify, differentiate and detect benign and malignant breast lesions. Given its marked advantages, radiomics is a frequently used tool in recent research and clinics. Artificial neural networks and deep learning (DL) are novel forms of ML that evaluate data using computer simulation of the human brain. DL directly processes unstructured information, such as images, sounds and language, and performs precise clinical image stratification, medical record analyses and tumour diagnosis. Herein, this review thoroughly summarizes prior investigations on the application of medical images for the detection and intervention of BC using radiomics, namely DL and ML. The aim was to provide guidance to scientists regarding the use of artificial intelligence and ML in research and the clinic.
Collapse
Affiliation(s)
- Ya Guo
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Heng Zhang
- Department of Laboratory Medicine, Shandong Daizhuang Hospital, Jining, Shandong Province, China
| | - Leilei Yuan
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Weidong Chen
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Haibo Zhao
- Department of Oncology, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Qing-Qing Yu
- Phase I Clinical Research Centre, Jining No.1 People’s Hospital, Shandong First Medical University, Jining, Shandong Province, China
| | - Wenjie Shi
- Molecular and Experimental Surgery, University Clinic for General-, Visceral-, Vascular- and Trans-Plantation Surgery, Medical Faculty University Hospital Magdeburg, Otto-von Guericke University, Magdeburg, Germany
| |
Collapse
|
3
|
Sobiecki A, Hadjiiski LM, Chan HP, Samala RK, Zhou C, Stojanovska J, Agarwal PP. Detection of Severe Lung Infection on Chest Radiographs of COVID-19 Patients: Robustness of AI Models across Multi-Institutional Data. Diagnostics (Basel) 2024; 14:341. [PMID: 38337857 PMCID: PMC10855789 DOI: 10.3390/diagnostics14030341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/24/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open
Abstract
The diagnosis of severe COVID-19 lung infection is important because it carries a higher risk for the patient and requires prompt treatment with oxygen therapy and hospitalization while those with less severe lung infection often stay on observation. Also, severe infections are more likely to have long-standing residual changes in their lungs and may need follow-up imaging. We have developed deep learning neural network models for classifying severe vs. non-severe lung infections in COVID-19 patients on chest radiographs (CXR). A deep learning U-Net model was developed to segment the lungs. Inception-v1 and Inception-v4 models were trained for the classification of severe vs. non-severe COVID-19 infection. Four CXR datasets from multi-country and multi-institutional sources were used to develop and evaluate the models. The combined dataset consisted of 5748 cases and 6193 CXR images with physicians' severity ratings as reference standard. The area under the receiver operating characteristic curve (AUC) was used to evaluate model performance. We studied the reproducibility of classification performance using the different combinations of training and validation data sets. We also evaluated the generalizability of the trained deep learning models using both independent internal and external test sets. The Inception-v1 based models achieved AUC ranging between 0.81 ± 0.02 and 0.84 ± 0.0, while the Inception-v4 models achieved AUC in the range of 0.85 ± 0.06 and 0.89 ± 0.01, on the independent test sets, respectively. These results demonstrate the promise of using deep learning models in differentiating COVID-19 patients with severe from non-severe lung infection on chest radiographs.
Collapse
Affiliation(s)
- André Sobiecki
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | - Lubomir M. Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | - Ravi K. Samala
- Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA;
| | - Chuan Zhou
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| | | | - Prachi P. Agarwal
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (A.S.); (H.-P.C.); (C.Z.); (P.P.A.)
| |
Collapse
|
4
|
Demircioğlu A. The effect of data resampling methods in radiomics. Sci Rep 2024; 14:2858. [PMID: 38310165 PMCID: PMC10838284 DOI: 10.1038/s41598-024-53491-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/01/2024] [Indexed: 02/05/2024] Open
Abstract
Radiomic datasets can be class-imbalanced, for instance, when the prevalence of diseases varies notably, meaning that the number of positive samples is much smaller than that of negative samples. In these cases, the majority class may dominate the model's training and thus negatively affect the model's predictive performance, leading to bias. Therefore, resampling methods are often utilized to class-balance the data. However, several resampling methods exist, and neither their relative predictive performance nor their impact on feature selection has been systematically analyzed. In this study, we aimed to measure the impact of nine resampling methods on radiomic models utilizing a set of fifteen publicly available datasets regarding their predictive performance. Furthermore, we evaluated the agreement and similarity of the set of selected features. Our results show that applying resampling methods did not improve the predictive performance on average. On specific datasets, slight improvements in predictive performance (+ 0.015 in AUC) could be seen. A considerable disagreement on the set of selected features was seen (only 28.7% of features agreed), which strongly impedes feature interpretability. However, selected features are similar when considering their correlation (82.9% of features correlated on average).
Collapse
Affiliation(s)
- Aydin Demircioğlu
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.
| |
Collapse
|
5
|
Xu L, Chen J, Qiu K, Yang F, Wu W. Artificial intelligence for detecting temporomandibular joint osteoarthritis using radiographic image data: A systematic review and meta-analysis of diagnostic test accuracy. PLoS One 2023; 18:e0288631. [PMID: 37450501 PMCID: PMC10348514 DOI: 10.1371/journal.pone.0288631] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 07/02/2023] [Indexed: 07/18/2023] Open
Abstract
In this review, we assessed the diagnostic efficiency of artificial intelligence (AI) models in detecting temporomandibular joint osteoarthritis (TMJOA) using radiographic imaging data. Based upon the PRISMA guidelines, a systematic review of studies published between January 2010 and January 2023 was conducted using PubMed, Web of Science, Scopus, and Embase. Articles on the accuracy of AI to detect TMJOA or degenerative changes by radiographic imaging were selected. The characteristics and diagnostic information of each article were extracted. The quality of studies was assessed by the QUADAS-2 tool. Pooled data for sensitivity, specificity, and summary receiver operating characteristic curve (SROC) were calculated. Of 513 records identified through a database search, six met the inclusion criteria and were collected. The pooled sensitivity, specificity, and area under the curve (AUC) were 80%, 90%, and 92%, respectively. Substantial heterogeneity between AI models mainly arose from imaging modality, ethnicity, sex, techniques of AI, and sample size. This article confirmed AI models have enormous potential for diagnosing TMJOA automatically through radiographic imaging. Therefore, AI models appear to have enormous potential to diagnose TMJOA automatically using radiographic images. However, further studies are needed to evaluate AI more thoroughly.
Collapse
Affiliation(s)
- Liang Xu
- The School of Stomatology, Fujian Medical University, Fuzhou, Fujian, China
- Department of Stomatology, The First Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, China
| | - Jiang Chen
- The School of Stomatology, Fujian Medical University, Fuzhou, Fujian, China
- School and Hospital of Stomatology, Fujian Medical University, Fuzhou, Fujian, China
| | - Kaixi Qiu
- Fuzhou No. 1 Hospital Affiliated with Fujian Medical University, Fuzhou, Fujian, China
| | - Feng Yang
- School and Hospital of Stomatology, Fujian Medical University, Fuzhou, Fujian, China
| | - Weiliang Wu
- The School of Stomatology, Fujian Medical University, Fuzhou, Fujian, China
| |
Collapse
|
6
|
Yoshida K, Tanabe Y, Nishiyama H, Matsuda T, Toritani H, Kitamura T, Sakai S, Watamori K, Takao M, Kimura E, Kido T. Feasibility of Bone Mineral Density and Bone Microarchitecture Assessment Using Deep Learning With a Convolutional Neural Network. J Comput Assist Tomogr 2023; 47:467-474. [PMID: 37185012 PMCID: PMC10184800 DOI: 10.1097/rct.0000000000001437] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
OBJECTIVES We evaluated the feasibility of using deep learning with a convolutional neural network for predicting bone mineral density (BMD) and bone microarchitecture from conventional computed tomography (CT) images acquired by multivendor scanners. METHODS We enrolled 402 patients who underwent noncontrast CT examinations, including L1-L4 vertebrae, and dual-energy x-ray absorptiometry (DXA) examination. Among these, 280 patients (3360 sagittal vertebral images), 70 patients (280 sagittal vertebral images), and 52 patients (208 sagittal vertebral images) were assigned to the training data set for deep learning model development, the validation, and the test data set, respectively. Bone mineral density and the trabecular bone score (TBS), an index of bone microarchitecture, were assessed by DXA. BMDDL and TBSDL were predicted by deep learning with a convolutional neural network (ResNet50). Pearson correlation tests assessed the correlation between BMDDL and BMD, and TBSDL and TBS. The diagnostic performance of BMDDL for osteopenia/osteoporosis and that of TBSDL for bone microarchitecture impairment were evaluated using receiver operating characteristic curve analysis. RESULTS BMDDL and BMD correlated strongly (r = 0.81, P < 0.01), whereas TBSDL and TBS correlated moderately (r = 0.54, P < 0.01). The sensitivity and specificity of BMDDL for identifying osteopenia or osteoporosis were 93% and 90%, and 100% and 94%, respectively. The sensitivity and specificity of TBSDL for identifying patients with bone microarchitecture impairment were 73% for all values. CONCLUSIONS The BMDDL and TBSDL derived from conventional CT images could identify patients who should undergo DXA, which could be a gatekeeper tool for detecting latent osteoporosis/osteopenia or bone microarchitecture impairment.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Shinichiro Sakai
- Orthopedic Surgery, Ehime University Graduate School of Medicine
| | | | - Masaki Takao
- Orthopedic Surgery, Ehime University Graduate School of Medicine
| | | | | |
Collapse
|
7
|
Li D, Li X, Li S, Qi M, Sun X, Hu G. Relationship between the deep features of the full-scan pathological map of mucinous gastric carcinoma and related genes based on deep learning. Heliyon 2023; 9:e14374. [PMID: 36942252 PMCID: PMC10023952 DOI: 10.1016/j.heliyon.2023.e14374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/28/2023] [Accepted: 03/02/2023] [Indexed: 03/11/2023] Open
Abstract
Background Long-term differential expression of disease-associated genes is a crucial driver of pathological changes in mucinous gastric carcinoma. Therefore, there should be a correlation between depth features extracted from pathology-based full-scan images using deep learning and disease-associated gene expression. This study tried to provides preliminary evidence that long-term differentially expressed (disease-associated) genes lead to subtle changes in disease pathology by exploring their correlation, and offer a new ideas for precise analysis of pathomics and combined analysis of pathomics and genomics. Methods Full pathological scans, gene sequencing data, and clinical data of patients with mucinous gastric carcinoma were downloaded from TCGA data. The VGG-16 network architecture was used to construct a binary classification model to explore the potential of VGG-16 applications and extract the deep features of the pathology-based full-scan map. Differential gene expression analysis was performed and a protein-protein interaction network was constructed to screen disease-related core genes. Differential, Lasso regression, and extensive correlation analyses were used to screen for valuable deep features. Finally, a correlation analysis was used to determine whether there was a correlation between valuable deep features and disease-related core genes. Result The accuracy of the binary classification model was 0.775 ± 0.129. A total of 24 disease-related core genes were screened, including ASPM, AURKA, AURKB, BUB1, BUB1B, CCNA2, CCNB1, CCNB2, CDCA8, CDK1, CENPF, DLGAP5, KIF11, KIF20A, KIF2C, KIF4A, MELK, PBK, RRM2, TOP2A, TPX2, TTK, UBE2C, and ZWINT. In addition, differential, Lasso regression, and extensive correlation analyses were used to screen eight valuable deep features, including features 51, 106, 109, 118, 257, 282, 326, and 487. Finally, the results of the correlation analysis suggested that valuable deep features were either positively or negatively correlated with core gene expression. Conclusion The preliminary results of this study support our hypotheses. Deep learning may be an important bridge for the joint analysis of pathomics and genomics and provides preliminary evidence for long-term abnormal expression of genes leading to subtle changes in pathology.
Collapse
Affiliation(s)
- Ding Li
- Department of Traditional Chinese Medicine, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Xiaoyuan Li
- Department of Traditional Chinese Medicine, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Shifang Li
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Mengmeng Qi
- Department of Endocrinology, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Xiaowei Sun
- Department of Traditional Chinese Medicine, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Guojie Hu
- Department of Traditional Chinese Medicine, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| |
Collapse
|
8
|
Li Y, He Z, Pan J, Zeng W, Liu J, Zeng Z, Xu W, Xu Z, Wang S, Wen C, Zeng H, Wu J, Ma X, Chen W, Lu Y. Atypical architectural distortion detection in digital breast tomosynthesis: a computer-aided detection model with adaptive receptive field. Phys Med Biol 2023; 68. [PMID: 36595312 DOI: 10.1088/1361-6560/acaba7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 12/14/2022] [Indexed: 12/15/2022]
Abstract
Objective. In digital breast tomosynthesis (DBT), architectural distortion (AD) is a breast lesion that is difficult to detect. Compared with typical ADs, which have radial patterns, identifying a typical ADs is more difficult. Most existing computer-aided detection (CADe) models focus on the detection of typical ADs. This study focuses on atypical ADs and develops a deep learning-based CADe model with an adaptive receptive field in DBT.Approach. Our proposed model uses a Gabor filter and convergence measure to depict the distribution of fibroglandular tissues in DBT slices. Subsequently, two-dimensional (2D) detection is implemented using a deformable-convolution-based deep learning framework, in which an adaptive receptive field is introduced to extract global features in slices. Finally, 2D candidates are aggregated to form the three-dimensional AD detection results. The model is trained on 99 positive cases with ADs and evaluated on 120 AD-positive cases and 100 AD-negative cases.Main results. A convergence-measure-based model and deep-learning model without an adaptive receptive field are reproduced as controls. Their mean true positive fractions (MTPF) ranging from 0.05 to 4 false positives per volume are 0.3846 ± 0.0352 and 0.6501 ± 0.0380, respectively. Our proposed model achieves an MTPF of 0.7148 ± 0.0322, which is a significant improvement (p< 0.05) compared with the other two methods. In particular, our model detects more atypical ADs, primarily contributing to the performance improvement.Significance. The adaptive receptive field helps the model improve the atypical AD detection performance. It can help radiologists identify more ADs in breast cancer screening.
Collapse
Affiliation(s)
- Yue Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, People's Republic of China.,Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Zilong He
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Jiawei Pan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, People's Republic of China.,Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Weixiong Zeng
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Jialing Liu
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Zhaodong Zeng
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Weimin Xu
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Zeyuan Xu
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Sina Wang
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Chanjuan Wen
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Hui Zeng
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Jiefang Wu
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Xiangyuan Ma
- Department of Biomedical Engineering, College of Engineering, Shantou University, Shantou, People's Republic of China.,Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Weiguo Chen
- Department of Radiology, Nanfang Hospital, Southern Medical University, Guangzhou, People's Republic of China
| | - Yao Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, People's Republic of China.,Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, People's Republic of China.,Shanghai Key Laboratory of Molecular Imaging, Shanghai University of Medicine and Health Sciences, Shanghai, People's Republic of China
| |
Collapse
|
9
|
Atasever S, Azginoglu N, Terzi DS, Terzi R. A comprehensive survey of deep learning research on medical image analysis with focus on transfer learning. Clin Imaging 2023; 94:18-41. [PMID: 36462229 DOI: 10.1016/j.clinimag.2022.11.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 10/17/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022]
Abstract
This survey aims to identify commonly used methods, datasets, future trends, knowledge gaps, constraints, and limitations in the field to provide an overview of current solutions used in medical image analysis in parallel with the rapid developments in transfer learning (TL). Unlike previous studies, this survey grouped the last five years of current studies for the period between January 2017 and February 2021 according to different anatomical regions and detailed the modality, medical task, TL method, source data, target data, and public or private datasets used in medical imaging. Also, it provides readers with detailed information on technical challenges, opportunities, and future research trends. In this way, an overview of recent developments is provided to help researchers to select the most effective and efficient methods and access widely used and publicly available medical datasets, research gaps, and limitations of the available literature.
Collapse
Affiliation(s)
- Sema Atasever
- Computer Engineering Department, Nevsehir Hacı Bektas Veli University, Nevsehir, Turkey.
| | - Nuh Azginoglu
- Computer Engineering Department, Kayseri University, Kayseri, Turkey.
| | | | - Ramazan Terzi
- Computer Engineering Department, Amasya University, Amasya, Turkey.
| |
Collapse
|
10
|
Hadjiiski L, Cha K, Chan HP, Drukker K, Morra L, Näppi JJ, Sahiner B, Yoshida H, Chen Q, Deserno TM, Greenspan H, Huisman H, Huo Z, Mazurchuk R, Petrick N, Regge D, Samala R, Summers RM, Suzuki K, Tourassi G, Vergara D, Armato SG. AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging. Med Phys 2023; 50:e1-e24. [PMID: 36565447 DOI: 10.1002/mp.16188] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/13/2022] [Accepted: 11/22/2022] [Indexed: 12/25/2022] Open
Abstract
Rapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both "traditional" machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods. Numerous studies have been published to date on the development of machine learning tools for computer-aided, or AI-assisted, clinical tasks. However, most of these machine learning models are not ready for clinical deployment. It is of paramount importance to ensure that a clinical decision support tool undergoes proper training and rigorous validation of its generalizability and robustness before adoption for patient care in the clinic. To address these important issues, the American Association of Physicists in Medicine (AAPM) Computer-Aided Image Analysis Subcommittee (CADSC) is charged, in part, to develop recommendations on practices and standards for the development and performance assessment of computer-aided decision support systems. The committee has previously published two opinion papers on the evaluation of CAD systems and issues associated with user training and quality assurance of these systems in the clinic. With machine learning techniques continuing to evolve and CAD applications expanding to new stages of the patient care process, the current task group report considers the broader issues common to the development of most, if not all, CAD-AI applications and their translation from the bench to the clinic. The goal is to bring attention to the proper training and validation of machine learning algorithms that may improve their generalizability and reliability and accelerate the adoption of CAD-AI systems for clinical decision support.
Collapse
Affiliation(s)
- Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Kenny Cha
- U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Karen Drukker
- Department of Radiology, University of Chicago, Chicago, Illinois, USA
| | - Lia Morra
- Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
| | - Janne J Näppi
- 3D Imaging Research, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Berkman Sahiner
- U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Hiroyuki Yoshida
- 3D Imaging Research, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Quan Chen
- Department of Radiation Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Thomas M Deserno
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany
| | - Hayit Greenspan
- Department of Biomedical Engineering, Faculty of Engineering, Tel Aviv, Israel & Department of Radiology, Ichan School of Medicine, Tel Aviv University, Mt Sinai, New York, New York, USA
| | - Henkjan Huisman
- Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Zhimin Huo
- Tencent America, Palo Alto, California, USA
| | - Richard Mazurchuk
- Division of Cancer Prevention, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Daniele Regge
- Radiology Unit, Candiolo Cancer Institute, FPO-IRCCS, Candiolo, Italy.,Department of Surgical Sciences, University of Turin, Turin, Italy
| | - Ravi Samala
- U.S. Food and Drug Administration, Silver Spring, Maryland, USA
| | - Ronald M Summers
- Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Kenji Suzuki
- Institute of Innovative Research, Tokyo Institute of Technology, Tokyo, Japan
| | | | - Daniel Vergara
- Department of Radiology, Yale New Haven Hospital, New Haven, Connecticut, USA
| | - Samuel G Armato
- Department of Radiology, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
11
|
Pfob A, Lu SC, Sidey-Gibbons C. Machine learning in medicine: a practical introduction to techniques for data pre-processing, hyperparameter tuning, and model comparison. BMC Med Res Methodol 2022; 22:282. [PMID: 36319956 PMCID: PMC9624048 DOI: 10.1186/s12874-022-01758-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 10/18/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND There is growing enthusiasm for the application of machine learning (ML) and artificial intelligence (AI) techniques to clinical research and practice. However, instructions on how to develop robust high-quality ML and AI in medicine are scarce. In this paper, we provide a practical example of techniques that facilitate the development of high-quality ML systems including data pre-processing, hyperparameter tuning, and model comparison using open-source software and data. METHODS We used open-source software and a publicly available dataset to train and validate multiple ML models to classify breast masses into benign or malignant using mammography image features and patient age. We compared algorithm predictions to the ground truth of histopathologic evaluation. We provide step-by-step instructions with accompanying code lines. FINDINGS Performance of the five algorithms at classifying breast masses as benign or malignant based on mammography image features and patient age was statistically equivalent (P > 0.05). Area under the receiver operating characteristics curve (AUROC) for the logistic regression with elastic net penalty was 0.89 (95% CI 0.85 - 0.94), for the Extreme Gradient Boosting Tree 0.88 (95% CI 0.83 - 0.93), for the Multivariate Adaptive Regression Spline algorithm 0.88 (95% CI 0.83 - 0.93), for the Support Vector Machine 0.89 (95% CI 0.84 - 0.93), and for the neural network 0.89 (95% CI 0.84 - 0.93). INTERPRETATION Our paper allows clinicians and medical researchers who are interested in using ML algorithms to understand and recreate the elements of a comprehensive ML analysis. Following our instructions may help to improve model generalizability and reproducibility in medical ML studies.
Collapse
Affiliation(s)
- André Pfob
- grid.5253.10000 0001 0328 4908Department of Obstetrics and Gynecology, University Breast Unit, Heidelberg University Hospital, Heidelberg, Germany ,grid.240145.60000 0001 2291 4776MD Anderson Center for INSPiRED Cancer Care, The University of Texas MD Anderson Cancer Center, Houston, USA
| | - Sheng-Chieh Lu
- grid.240145.60000 0001 2291 4776MD Anderson Center for INSPiRED Cancer Care, The University of Texas MD Anderson Cancer Center, Houston, USA ,grid.240145.60000 0001 2291 4776Section of Patient-Centered Analytics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Chris Sidey-Gibbons
- grid.240145.60000 0001 2291 4776MD Anderson Center for INSPiRED Cancer Care, The University of Texas MD Anderson Cancer Center, Houston, USA ,grid.240145.60000 0001 2291 4776Section of Patient-Centered Analytics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| |
Collapse
|
12
|
Kim HE, Cosa-Linan A, Santhanam N, Jannesari M, Maros ME, Ganslandt T. Transfer learning for medical image classification: a literature review. BMC Med Imaging 2022; 22:69. [PMID: 35418051 PMCID: PMC9007400 DOI: 10.1186/s12880-022-00793-7] [Citation(s) in RCA: 129] [Impact Index Per Article: 64.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/30/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Transfer learning (TL) with convolutional neural networks aims to improve performances on a new task by leveraging the knowledge of similar tasks learned in advance. It has made a major contribution to medical image analysis as it overcomes the data scarcity problem as well as it saves time and hardware resources. However, transfer learning has been arbitrarily configured in the majority of studies. This review paper attempts to provide guidance for selecting a model and TL approaches for the medical image classification task. METHODS 425 peer-reviewed articles were retrieved from two databases, PubMed and Web of Science, published in English, up until December 31, 2020. Articles were assessed by two independent reviewers, with the aid of a third reviewer in the case of discrepancies. We followed the PRISMA guidelines for the paper selection and 121 studies were regarded as eligible for the scope of this review. We investigated articles focused on selecting backbone models and TL approaches including feature extractor, feature extractor hybrid, fine-tuning and fine-tuning from scratch. RESULTS The majority of studies (n = 57) empirically evaluated multiple models followed by deep models (n = 33) and shallow (n = 24) models. Inception, one of the deep models, was the most employed in literature (n = 26). With respect to the TL, the majority of studies (n = 46) empirically benchmarked multiple approaches to identify the optimal configuration. The rest of the studies applied only a single approach for which feature extractor (n = 38) and fine-tuning from scratch (n = 27) were the two most favored approaches. Only a few studies applied feature extractor hybrid (n = 7) and fine-tuning (n = 3) with pretrained models. CONCLUSION The investigated studies demonstrated the efficacy of transfer learning despite the data scarcity. We encourage data scientists and practitioners to use deep models (e.g. ResNet or Inception) as feature extractors, which can save computational costs and time without degrading the predictive power.
Collapse
Affiliation(s)
- Hee E Kim
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167, Mannheim, Germany.
| | - Alejandro Cosa-Linan
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167, Mannheim, Germany
| | - Nandhini Santhanam
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167, Mannheim, Germany
| | - Mahboubeh Jannesari
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167, Mannheim, Germany
| | - Mate E Maros
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167, Mannheim, Germany
| | - Thomas Ganslandt
- Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health (CPD-BW), Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, 68167, Mannheim, Germany
- Chair of Medical Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Wetterkreuz 15, 91058, Erlangen, Germany
| |
Collapse
|
13
|
Ortíz-Barrios MA, Garcia-Constantino M, Nugent C, Alfaro-Sarmiento I. A Novel Integration of IF-DEMATEL and TOPSIS for the Classifier Selection Problem in Assistive Technology Adoption for People with Dementia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:1133. [PMID: 35162153 PMCID: PMC8834594 DOI: 10.3390/ijerph19031133] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 01/12/2022] [Accepted: 01/17/2022] [Indexed: 01/27/2023]
Abstract
The classifier selection problem in Assistive Technology Adoption refers to selecting the classification algorithms that have the best performance in predicting the adoption of technology, and is often addressed through measuring different single performance indicators. Satisfactory classifier selection can help in reducing time and costs involved in the technology adoption process. As there are multiple criteria from different domains and several candidate classification algorithms, the classifier selection process is now a problem that can be addressed using Multiple-Criteria Decision-Making (MCDM) methods. This paper proposes a novel approach to address the classifier selection problem by integrating Intuitionistic Fuzzy Sets (IFS), Decision Making Trial and Evaluation Laboratory (DEMATEL), and the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). The step-by-step procedure behind this application is as follows. First, IF-DEMATEL was used for estimating the criteria and sub-criteria weights considering uncertainty. This method was also employed to evaluate the interrelations among classifier selection criteria. Finally, a modified TOPSIS was applied to generate an overall suitability index per classifier so that the most effective ones can be selected. The proposed approach was validated using a real-world case study concerning the adoption of a mobile-based reminding solution by People with Dementia (PwD). The outputs allow public health managers to accurately identify whether PwD can adopt an assistive technology which results in (i) reduced cost overruns due to wrong classification, (ii) improved quality of life of adopters, and (iii) rapid deployment of intervention alternatives for non-adopters.
Collapse
Affiliation(s)
| | | | - Chris Nugent
- School of Computing and Mathematics, Ulster University, Jordanstown BT37 0QB, UK; (M.G.-C.); (C.N.)
| | - Isaac Alfaro-Sarmiento
- Department of Productivity and Innovation, Universidad de la Costa CUC, Barranquilla 081001, Colombia;
| |
Collapse
|
14
|
Rowe TW, Katzourou IK, Stevenson-Hoare JO, Bracher-Smith MR, Ivanov DK, Escott-Price V. Machine learning for the life-time risk prediction of Alzheimer's disease: a systematic review. Brain Commun 2021; 3:fcab246. [PMID: 34805994 PMCID: PMC8598986 DOI: 10.1093/braincomms/fcab246] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/23/2022] Open
Abstract
Alzheimer’s disease is a neurodegenerative disorder and the most common form of dementia. Early diagnosis may assist interventions to delay onset and reduce the progression rate of the disease. We systematically reviewed the use of machine learning algorithms for predicting Alzheimer’s disease using single nucleotide polymorphisms and instances where these were combined with other types of data. We evaluated the ability of machine learning models to distinguish between controls and cases, while also assessing their implementation and potential biases. Articles published between December 2009 and June 2020 were collected using Scopus, PubMed and Google Scholar. These were systematically screened for inclusion leading to a final set of 12 publications. Eighty-five per cent of the included studies used the Alzheimer's Disease Neuroimaging Initiative dataset. In studies which reported area under the curve, discrimination varied (0.49–0.97). However, more than half of the included manuscripts used other forms of measurement, such as accuracy, sensitivity and specificity. Model calibration statistics were also found to be reported inconsistently across all studies. The most frequent limitation in the assessed studies was sample size, with the total number of participants often numbering less than a thousand, whilst the number of predictors usually ran into the many thousands. In addition, key steps in model implementation and validation were often not performed or unreported, making it difficult to assess the capability of machine learning models.
Collapse
Affiliation(s)
- Thomas W Rowe
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
| | | | | | - Matthew R Bracher-Smith
- Division of Psychological Medicine and Clinical Neurosciences, School of Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| | - Dobril K Ivanov
- UK Dementia Research Institute, Cardiff University, Cardiff, UK
| | - Valentina Escott-Price
- UK Dementia Research Institute, Cardiff University, Cardiff, UK.,Division of Psychological Medicine and Clinical Neurosciences, School of Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff CF24 4HQ, UK
| |
Collapse
|