1
|
Abbas S, Asif M, Rehman A, Alharbi M, Khan MA, Elmitwally N. Emerging research trends in artificial intelligence for cancer diagnostic systems: A comprehensive review. Heliyon 2024; 10:e36743. [PMID: 39263113 PMCID: PMC11387343 DOI: 10.1016/j.heliyon.2024.e36743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/20/2024] [Accepted: 08/21/2024] [Indexed: 09/13/2024] Open
Abstract
This review article offers a comprehensive analysis of current developments in the application of machine learning for cancer diagnostic systems. The effectiveness of machine learning approaches has become evident in improving the accuracy and speed of cancer detection, addressing the complexities of large and intricate medical datasets. This review aims to evaluate modern machine learning techniques employed in cancer diagnostics, covering various algorithms, including supervised and unsupervised learning, as well as deep learning and federated learning methodologies. Data acquisition and preprocessing methods for different types of data, such as imaging, genomics, and clinical records, are discussed. The paper also examines feature extraction and selection techniques specific to cancer diagnosis. Model training, evaluation metrics, and performance comparison methods are explored. Additionally, the review provides insights into the applications of machine learning in various cancer types and discusses challenges related to dataset limitations, model interpretability, multi-omics integration, and ethical considerations. The emerging field of explainable artificial intelligence (XAI) in cancer diagnosis is highlighted, emphasizing specific XAI techniques proposed to improve cancer diagnostics. These techniques include interactive visualization of model decisions and feature importance analysis tailored for enhanced clinical interpretation, aiming to enhance both diagnostic accuracy and transparency in medical decision-making. The paper concludes by outlining future directions, including personalized medicine, federated learning, deep learning advancements, and ethical considerations. This review aims to guide researchers, clinicians, and policymakers in the development of efficient and interpretable machine learning-based cancer diagnostic systems.
Collapse
Affiliation(s)
- Sagheer Abbas
- Department of Computer Science, Prince Mohammad Bin Fahd University, Al-Khobar, KSA
| | - Muhammad Asif
- Department of Computer Science, Education University Lahore, Attock Campus, Pakistan
| | - Abdur Rehman
- School of Computer Science, National College of Business Administration and Economics, Lahore, 54000, Pakistan
| | - Meshal Alharbi
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, 11942, Alkharj, Saudi Arabia
| | - Muhammad Adnan Khan
- Riphah School of Computing & Innovation, Faculty of Computing, Riphah International University, Lahore Campus, Lahore, 54000, Pakistan
- School of Computing, Skyline University College, University City Sharjah, 1797, Sharjah, United Arab Emirates
- Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam-si, 13120, Republic of Korea
| | - Nouh Elmitwally
- Department of Computer Science, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, 12613, Egypt
- School of Computing and Digital Technology, Birmingham City University, Birmingham, B4 7XG, UK
| |
Collapse
|
2
|
Zhang H, Hussin H, Hoh CC, Cheong SH, Lee WK, Yahaya BH. Big data in breast cancer: Towards precision treatment. Digit Health 2024; 10:20552076241293695. [PMID: 39502482 PMCID: PMC11536614 DOI: 10.1177/20552076241293695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 10/07/2024] [Indexed: 11/08/2024] Open
Abstract
Breast cancer is the most prevalent and deadliest cancer among women globally, representing a major threat to public health. In response, the World Health Organization has established the Global Breast Cancer Initiative framework to reduce breast cancer mortality through global collaboration. The integration of big data analytics (BDA) and precision medicine has transformed our understanding of breast cancer's biological traits and treatment responses. By harnessing large-scale datasets - encompassing genetic, clinical, and environmental data - BDA has enhanced strategies for breast cancer prevention, diagnosis, and treatment, driving the advancement of precision oncology and personalised care. Despite the increasing importance of big data in breast cancer research, comprehensive studies remain sparse, underscoring the need for more systematic investigation. This review evaluates the contributions of big data to breast cancer precision medicine while addressing the associated opportunities and challenges. Through the application of big data, we aim to deepen insights into breast cancer pathogenesis, optimise therapeutic approaches, improve patient outcomes, and ultimately contribute to better survival rates and quality of life. This review seeks to provide a foundation for future research in breast cancer prevention, treatment, and management.
Collapse
Affiliation(s)
- Hao Zhang
- Breast Cancer Translational Research Program (BCTRP@IPPT), Universiti Sains Malaysia, Kepala Batas, Penang, Malaysia
- Department of Biomedical Sciences, Advanced Medical and Dental Institute (IPPT), Universiti Sains Malaysia, Kepala Batas, Penang, Malaysia
| | - Hasmah Hussin
- Breast Cancer Translational Research Program (BCTRP@IPPT), Universiti Sains Malaysia, Kepala Batas, Penang, Malaysia
- Department of Clinical Medicine, Advanced Medical and Dental Institute (IPPT), Universiti Sains Malaysia, Kepala Batas, Penang, Malaysia
| | | | | | - Wei-Kang Lee
- Codon Genomics Sdn Bhd, Seri Kembangan, Selangor, Malaysia
| | - Badrul Hisham Yahaya
- Breast Cancer Translational Research Program (BCTRP@IPPT), Universiti Sains Malaysia, Kepala Batas, Penang, Malaysia
- Department of Biomedical Sciences, Advanced Medical and Dental Institute (IPPT), Universiti Sains Malaysia, Kepala Batas, Penang, Malaysia
| |
Collapse
|
3
|
Ahmed M, Mäkinen VP, Lumsden A, Boyle T, Mulugeta A, Lee SH, Olver I, Hyppönen E. Metabolic profile predicts incident cancer: A large-scale population study in the UK Biobank. Metabolism 2023; 138:155342. [PMID: 36377121 DOI: 10.1016/j.metabol.2022.155342] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 10/24/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022]
Abstract
BACKGROUND AND AIMS Analyses to predict the risk of cancer typically focus on single biomarkers, which do not capture their complex interrelations. We hypothesized that the use of metabolic profiles may provide new insights into cancer prediction. METHODS We used information from 290,888 UK Biobank participants aged 37 to 73 years at baseline. Metabolic subgroups were defined based on clustering of biochemical data using an artificial neural network approach and examined for their association with incident cancers identified through linkage to cancer registry. In addition, we evaluated associations between 38 individual biomarkers and cancer risk. RESULTS In total, 21,973 individuals developed cancer during the follow-up (median 3.87 years, interquartile range [IQR] = 2.03-5.58). Compared to the metabolically favorable subgroup (IV), subgroup III (defined as "high BMI, C-reactive protein & cystatin C") was associated with a higher risk of obesity-related cancers (hazard ratio [HR] = 1.26, 95 % CI = 1.21 to 1.32) and hematologic-malignancies (e.g., lymphoid leukemia: HR = 1.83, 95%CI = 1.44 to 2.33). Subgroup II ("high triglycerides & liver enzymes") was strongly associated with liver cancer risk (HR = 5.70, 95%CI = 3.57 to 9.11). Analysis of individual biomarkers showed a positive association between testosterone and greater risks of hormone-sensitive cancers (HR per SD higher = 1.32, 95%CI = 1.23 to 1.44), and liver cancer (HR = 2.49, 95%CI =1.47 to 4.24). Many liver tests were individually associated with a greater risk of liver cancer with the strongest association observed for gamma-glutamyl transferase (HR = 2.40, 95%CI = 2.19 to 2.65). CONCLUSIONS Metabolic profile in middle-to-older age can predict cancer incidence, in particular risk of obesity-related cancer, hematologic malignancies, and liver cancer. Elevated values from liver tests are strong predictors for later risk of liver cancer.
Collapse
Affiliation(s)
- Muktar Ahmed
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; Department of Epidemiology, Faculty of Public Health, Jimma University Institute of Health, Jimma, Ethiopia; UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia; South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Ville-Petteri Mäkinen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; Computational Systems Biology Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Amanda Lumsden
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia
| | - Terry Boyle
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; South Australian Health and Medical Research Institute, Adelaide, SA, Australia; UniSA Allied Health & Human Performance, University of South Australia, Adelaide, SA, Australia
| | - Anwar Mulugeta
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia; South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Sang Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; South Australian Health and Medical Research Institute, Adelaide, SA, Australia; UniSA Allied Health & Human Performance, University of South Australia, Adelaide, SA, Australia
| | - Ian Olver
- School of Psychology, Faculty of Health and Medical Sciences, University of Adelaide, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia; UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia; South Australian Health and Medical Research Institute, Adelaide, SA, Australia.
| |
Collapse
|
4
|
Parra-Rodríguez L, Reyes-Ramírez E, Jiménez-Andrade JL, Carrillo-Calvet H, García-Peña C. Self-Organizing Maps to Multidimensionally Characterize Physical Profiles in Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:12412. [PMID: 36231709 PMCID: PMC9565208 DOI: 10.3390/ijerph191912412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 09/10/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
The aim of this study is to automatically analyze, characterize and classify physical performance and body composition data of a cohort of Mexican community-dwelling older adults. Self-organizing maps (SOM) were used to identify similar profiles in 562 older adults living in Mexico City that participated in this study. Data regarding demographics, geriatric syndromes, comorbidities, physical performance, and body composition were obtained. The sample was divided by sex, and the multidimensional analysis included age, gait speed over height, grip strength over body mass index, one-legged stance, lean appendicular mass percentage, and fat percentage. Using the SOM neural network, seven profile types for older men and women were identified. This analysis provided maps depicting a set of clusters qualitatively characterizing groups of older adults that share similar profiles of body composition and physical performance. The SOM neural network proved to be a useful tool for analyzing multidimensional health care data and facilitating its interpretability. It provided a visual representation of the non-linear relationship between physical performance and body composition variables, as well as the identification of seven characteristic profiles in this cohort.
Collapse
Affiliation(s)
| | | | - José Luis Jiménez-Andrade
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
- Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación, INFOTEC, Mexico City 14050, Mexico
| | - Humberto Carrillo-Calvet
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Carmen García-Peña
- Research Department, Instituto Nacional de Geriatría, Mexico City 10200, Mexico
| |
Collapse
|
5
|
Carrillo-Vega MF, Pérez-Zepeda MU, Salinas-Escudero G, García-Peña C, Reyes-Ramírez ED, Espinel-Bermúdez MC, Sánchez-García S, Parra-Rodríguez L. Patterns of Muscle-Related Risk Factors for Sarcopenia in Older Mexican Women. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:10239. [PMID: 36011874 PMCID: PMC9408641 DOI: 10.3390/ijerph191610239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/27/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Early detriment in the muscle mass quantity, quality, and functionality, determined by calf circumference (CC), phase angle (PA), gait time (GT), and grip strength (GSt), may be considered a risk factor for sarcopenia. Patterns derived from these parameters could timely identify an early stage of this disease. Thus, the present work aims to identify those patterns of muscle-related parameters and their association with sarcopenia in a cohort of older Mexican women with neural network analysis. Methods: Information from the functional decline patterns at the end of life, related factors, and associated costs study was used. A self-organizing map was used to analyze the information. A SOM is an unsupervised machine learning technique that projects input variables on a low-dimensional hexagonal grid that can be effectively utilized to visualize and explore properties of the data allowing to cluster individuals with similar age, GT, GSt, CC, and PA. An unadjusted logistic regression model assessed the probability of having sarcopenia given a particular cluster. Results: 250 women were evaluated. Mean age was 68.54 ± 5.99, sarcopenia was present in 31 (12.4%). Clusters 1 and 2 had similar GT, GSt, and CC values. Moreover, in cluster 1, women were older with higher PA values (p < 0.001). From cluster 3 upward, there is a trend of worse scores for every variable. Moreover, 100% of the participants in cluster 6 have sarcopenia (p < 0.001). Women in clusters 4 and 5 were 19.29 and 90 respectively, times more likely to develop sarcopenia than those from cluster 2 (p < 0.01). Conclusions: The joint use of age, GSt, GT, CC, and PA is strongly associated with the probability women have of presenting sarcopenia.
Collapse
Affiliation(s)
| | - Mario Ulises Pérez-Zepeda
- Instituto Nacional de Geriatría, Dirección de Investigación, Av. Contreras 428, Ciudad de México 10200, Mexico
- Centro de Investigación en Ciencias de la Salud (CICSA), Universidad Anáhuac México Campus NorteFCS, Huixquilucan 52786, Mexico
| | - Guillermo Salinas-Escudero
- Hospital Infantil de Mexico Federico Gómez, Centro de Estudios Económicos y Sociales en Salud, Calle Doctor Márquez 162, Ciudad de Mexico 06720, Mexico
| | - Carmen García-Peña
- Instituto Nacional de Geriatría, Dirección de Investigación, Av. Contreras 428, Ciudad de México 10200, Mexico
| | - Edward Daniel Reyes-Ramírez
- Instituto Nacional de Geriatría, Dirección de Investigación, Av. Contreras 428, Ciudad de México 10200, Mexico
| | - María Claudia Espinel-Bermúdez
- Instituto Mexicano del Seguro Social, Centro Mexico Nacional de Occidente, Unidad Médica de Alta Especialidad Hospital de Especialidades, Unidad de Investigación Biomédica 02 y División de Investigación en Salud, Av. Belisario Domínguez 1000, Guadalajara 44340, Mexico
| | - Sergio Sánchez-García
- Instituto Mexicano del Seguro Social, Centro Médico Nacional Siglo XXI, Unidad de Investigación en Epidemiología y Servicios de Salud, Área de Envejecimiento, Av. Cuauhtémoc 330, Ciudad de México 06720, Mexico
| | - Lorena Parra-Rodríguez
- Instituto Nacional de Geriatría, Dirección de Investigación, Av. Contreras 428, Ciudad de México 10200, Mexico
| |
Collapse
|
6
|
Ji H, Li J, Zhang Q, Yang J, Duan J, Wang X, Ma B, Zhang Z, Pan W, Zhang H. Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach. BMC Med Genomics 2021; 14:298. [PMID: 34930241 PMCID: PMC8686331 DOI: 10.1186/s12920-021-01144-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 12/06/2021] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited. METHODS We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes. RESULTS Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes. CONCLUSIONS We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested.
Collapse
Affiliation(s)
- Hongchen Ji
- Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China
- Faculty of Hepatopancreatobiliary Surgery, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, China
| | - Junjie Li
- Department of Emergency, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, China
| | - Qiong Zhang
- Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China
| | - Jingyue Yang
- Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China
| | - Juanli Duan
- Department of Hepatoxbiliary Surgery, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, China
| | - Xiaowen Wang
- Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China
| | - Ben Ma
- Faculty of Hepatopancreatobiliary Surgery, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, China
| | - Zhuochao Zhang
- Faculty of Hepatopancreatobiliary Surgery, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, China
| | - Wei Pan
- Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China
| | - Hongmei Zhang
- Department of Oncology, Xijing Hospital, Fourth Military Medical University, No. 127 West Changle Road, Xi'an, 710032, China.
| |
Collapse
|
7
|
Zhang Q, Bu X, Zhang M, Zhang Z, Hu J. Dynamic uncertain causality graph for computer-aided general clinical diagnoses with nasal obstruction as an illustration. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09871-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
8
|
Pina A, Macedo MP, Henriques R. Clustering Clinical Data in R. Methods Mol Biol 2020; 2051:309-343. [PMID: 31552636 DOI: 10.1007/978-1-4939-9744-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We are currently witnessing a paradigm shift from evidence-based medicine to precision medicine, which has been made possible by the enormous development of technology. The advances in data mining algorithms will allow us to integrate trans-omics with clinical data, contributing to our understanding of pathological mechanisms and massively impacting on the clinical sciences. Cluster analysis is one of the main data mining techniques and allows for the exploration of data patterns that the human mind cannot capture.This chapter focuses on the cluster analysis of clinical data, using the statistical software, R. We outline the cluster analysis process, underlining some clinical data characteristics. Starting with the data preprocessing step, we then discuss the advantages and disadvantages of the most commonly used clustering algorithms and point to examples of their applications in clinical work. Finally, we briefly discuss how to perform validation of clusters. Throughout the chapter we highlight R packages suitable for each computational step of cluster analysis.
Collapse
Affiliation(s)
- Ana Pina
- Centro de Estudos de Doenças Crónicas (CEDOC), NOVA Medical School-Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal. .,ProRegeM PhD Programme, NOVA Medical School/Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal. .,Department of Medical Sciences, Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Maria Paula Macedo
- Centro de Estudos de Doenças Crónicas (CEDOC), NOVA Medical School-Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal.,Department of Medical Sciences, Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.,APDP-Diabetes Portugal Education and Research Center (APDP-ERC), Lisbon, Portugal
| | - Roberto Henriques
- NOVA Information Management School (NOVA IMS), Universidade NOVA de Lisboa, Lisbon, Portugal
| |
Collapse
|
9
|
Kalantari A, Kamsin A, Shamshirband S, Gani A, Alinejad-Rokny H, Chronopoulos AT. Computational intelligence approaches for classification of medical data: State-of-the-art, future challenges and research directions. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.01.126] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
10
|
Involvement of Machine Learning for Breast Cancer Image Classification: A Survey. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:3781951. [PMID: 29463985 PMCID: PMC5804413 DOI: 10.1155/2017/3781951] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 10/26/2017] [Indexed: 11/17/2022]
Abstract
Breast cancer is one of the largest causes of women's death in the world today. Advance engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. The involvement of digital image classification allows the doctor and the physicians a second opinion, and it saves the doctors' and physicians' time. Despite the various publications on breast image classification, very few review papers are available which provide a detailed description of breast cancer image classification techniques, feature extraction and selection procedures, classification measuring parameterizations, and image classification findings. We have put a special emphasis on the Convolutional Neural Network (CNN) method for breast image classification. Along with the CNN method we have also described the involvement of the conventional Neural Network (NN), Logic Based classifiers such as the Random Forest (RF) algorithm, Support Vector Machines (SVM), Bayesian methods, and a few of the semisupervised and unsupervised methods which have been used for breast image classification.
Collapse
|
11
|
Ghayoumi Zadeh H, Montazeri A, Abaspur Kazerouni I, Haddadnia J. Clustering and screening for breast cancer on thermal images using a combination of SOM and MLP. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2017. [DOI: 10.1080/21681163.2014.978896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Borkowska EM, Kruk A, Jedrzejczyk A, Rozniecki M, Jablonowski Z, Traczyk M, Constantinou M, Banaszkiewicz M, Pietrusinski M, Sosnowski M, Hamdy FC, Peter S, Catto JWF, Kaluzewski B. Molecular subtyping of bladder cancer using Kohonen self-organizing maps. Cancer Med 2014; 3:1225-34. [PMID: 25142434 PMCID: PMC4302672 DOI: 10.1002/cam4.217] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 12/22/2013] [Accepted: 01/19/2014] [Indexed: 11/24/2022] Open
Abstract
Kohonen self-organizing maps (SOMs) are unsupervised Artificial Neural Networks (ANNs) that are good for low-density data visualization. They easily deal with complex and nonlinear relationships between variables. We evaluated molecular events that characterize high- and low-grade BC pathways in the tumors from 104 patients. We compared the ability of statistical clustering with a SOM to stratify tumors according to the risk of progression to more advanced disease. In univariable analysis, tumor stage (log rank P = 0.006) and grade (P < 0.001), HPV DNA (P < 0.004), Chromosome 9 loss (P = 0.04) and the A148T polymorphism (rs 3731249) in CDKN2A (P = 0.02) were associated with progression. Multivariable analysis of these parameters identified that tumor grade (Cox regression, P = 0.001, OR.2.9 (95% CI 1.6–5.2)) and the presence of HPV DNA (P = 0.017, OR 3.8 (95% CI 1.3–11.4)) were the only independent predictors of progression. Unsupervised hierarchical clustering grouped the tumors into discreet branches but did not stratify according to progression free survival (log rank P = 0.39). These genetic variables were presented to SOM input neurons. SOMs are suitable for complex data integration, allow easy visualization of outcomes, and may stratify BC progression more robustly than hierarchical clustering.
Collapse
Affiliation(s)
- Edyta M Borkowska
- Department of Clinical Genetics, Medical University of Lodz, 3 Sterlinga Street, Lodz, 91-425, Poland; Institute for Cancer Studies and Academic Urology Unit, University of Sheffield, Beech Hill Road, Sheffield, S10 2RX, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Schilithz AOC, Kale PL, Gama SGN, Nobre FF. Risk groups in children under six months of age using self-organizing maps. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 115:1-10. [PMID: 24725333 DOI: 10.1016/j.cmpb.2014.02.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Revised: 01/22/2014] [Accepted: 02/20/2014] [Indexed: 06/03/2023]
Abstract
Fetal and infant growth tends to follow irregular patterns and, particularly in developing countries, these patterns are greatly influenced by unfavorable living conditions and interactions with complications during pregnancy. The aim of this study was to identify groups of children with different risk profiles for growth development. The study sample comprised 496 girls and 508 boys under six months of age from 27 pediatric primary health care units in the city of Rio de Janeiro, Brazil. Data were obtained through interviews with the mothers and by reviewing each child's health card. An unsupervised learning, know as a self-organizing map (SOM) and a K-means algorithm were used for cluster analysis to identify groups of children. Four groups of infants were identified. The first (139) consisted of infants born exclusively by cesarean delivery, and their mothers were exclusively multiparous; the highest prevalences of prematurity and low birthweight, a high prevalence of exclusive breastfeeding and a low proportion of hospitalization were observed for this group. The second (247 infants) and the third (298 infants) groups had the best and worst perinatal and infant health indicators, respectively. The infants of the fourth group (318) were born heavier, had a low prevalence of exclusive breastfeeding, and had a higher rate of hospitalization. Using a SOM, it was possible to identify children with common features, although no differences between groups were found with respect to the adequacy of postnatal weight. Pregnant women and children with characteristics similar to those of group 3 require early intervention and more attention in public policy.
Collapse
Affiliation(s)
| | - P L Kale
- IESC/UFRJ, Rio de Janeiro, Brazil
| | | | | |
Collapse
|
14
|
|
15
|
Observer study of a prototype clinical decision support system for breast cancer diagnosis using dynamic contrast-enhanced MRI. AJR Am J Roentgenol 2013; 200:277-83. [PMID: 23345346 DOI: 10.2214/ajr.12.8718] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The purpose of this article is to evaluate the performance of radiologists using a prototype clinical decision support system to diagnose and manage patients with breast cancer based on dynamic contrast-enhanced MRI studies. MATERIALS AND METHODS The study was conducted with three breast radiologists and two breast imaging fellows who gave patient treatment recommendations and confidence ratings, both without and with computer aid. The computer aid presented similar cases from a retrieval database of 192 lesions (96 malignant and 96 benign) for a test set of 97 mass lesions (46 malignant and 51 benign). The performance of each observer was quantified by receiver operating characteristic analysis. The radiologists' confidence in their recommendations was analyzed with respect to the query case pathologic diagnosis, perceived usefulness of the similar cases, and the accuracy of the computer in retrieving cases of the correct diagnosis. The statistical significance in the performance measure differences was determined by using a two-tailed Student t test for paired data. RESULTS For each observer, the area under the receiver operating characteristic curve did not change significantly with the use of the computer aid (from a mean of 0.8 to a mean of 0.8; p = 0.61). The average confidence of three of the five observers increased significantly with the computer aid (from 5.9 to 6.3 [p < 0.001], from 7.0 to 7.2 [p = 0.04], and from 4.4 to 5.4 [p < 0.001], respectively). The confidence change of the radiologists was more frequent and larger for malignant lesions where the computer was correct. However, for benign lesions, even when the computer was correct, the confidence of the radiologists did not necessarily change. CONCLUSION The presentation of similar cases reinforced radiologists' confidence rating in the diagnosis of malignant lesions; however, it did not change their confidence rating for benign lesions or reduce the number of unnecessary biopsies in managing patients with breast cancer using dynamic contrast-enhanced MRI under the limited study conditions.
Collapse
|
16
|
Zhao W, Davis CE. A modified artificial immune system based pattern recognition approach--an application to clinical diagnostics. Artif Intell Med 2011; 52:1-9. [PMID: 21515033 DOI: 10.1016/j.artmed.2011.03.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Revised: 03/02/2011] [Accepted: 03/11/2011] [Indexed: 12/27/2022]
Abstract
OBJECTIVE This paper introduces a modified artificial immune system (AIS)-based pattern recognition method to enhance the recognition ability of the existing conventional AIS-based classification approach and demonstrates the superiority of the proposed new AIS-based method via two case studies of breast cancer diagnosis. METHODS AND MATERIALS Conventionally, the AIS approach is often coupled with the k nearest neighbor (k-NN) algorithm to form a classification method called AIS-kNN. In this paper we discuss the basic principle and possible problems of this conventional approach, and propose a new approach where AIS is integrated with the radial basis function--partial least square regression (AIS-RBFPLS). Additionally, both the two AIS-based approaches are compared with two classical and powerful machine learning methods, back-propagation neural network (BPNN) and orthogonal radial basis function network (Ortho-RBF network). RESULTS The diagnosis results show that: (1) both the AIS-kNN and the AIS-RBFPLS proved to be a good machine leaning method for clinical diagnosis, but the proposed AIS-RBFPLS generated an even lower misclassification ratio, especially in the cases where the conventional AIS-kNN approach generated poor classification results because of possible improper AIS parameters. For example, based upon the AIS memory cells of "replacement threshold=0.3", the average misclassification ratios of two approaches for study 1 are 3.36% (AIS-RBFPLS) and 9.07% (AIS-kNN), and the misclassification ratios for study 2 are 19.18% (AIS-RBFPLS) and 28.36% (AIS-kNN); (2) the proposed AIS-RBFPLS presented its robustness in terms of the AIS-created memory cells, showing a smaller standard deviation of the results from the multiple trials than AIS-kNN. For example, using the result from the first set of AIS memory cells as an example, the standard deviations of the misclassification ratios for study 1 are 0.45% (AIS-RBFPLS) and 8.71% (AIS-kNN) and those for study 2 are 0.49% (AIS-RBFPLS) and 6.61% (AIS-kNN); and (3) the proposed AIS-RBFPLS classification approaches also yielded better diagnosis results than two classical neural network approaches of BPNN and Ortho-RBF network. CONCLUSION In summary, this paper proposed a new machine learning method for complex systems by integrating the AIS system with RBFPLS. This new method demonstrates its satisfactory effect on classification accuracy for clinical diagnosis, and also indicates its wide potential applications to other diagnosis and detection problems.
Collapse
Affiliation(s)
- Weixiang Zhao
- Department of Mechanical and Aerospace Engineering, One Shields Avenue, University of California, Davis, CA 95616, United States
| | | |
Collapse
|
17
|
Singh S, Maxwell J, Baker JA, Nicholas JL, Lo JY. Computer-aided classification of breast masses: performance and interobserver variability of expert radiologists versus residents. Radiology 2010; 258:73-80. [PMID: 20971779 DOI: 10.1148/radiol.10081308] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
PURPOSE To evaluate the interobserver variability in descriptions of breast masses by dedicated breast imagers and radiology residents and determine how any differences in lesion description affect the performance of a computer-aided diagnosis (CAD) computer classification system. MATERIALS AND METHODS Institutional review board approval was obtained for this HIPAA-compliant study, and the requirement to obtain informed consent was waived. Images of 50 breast lesions were individually interpreted by seven dedicated breast imagers and 10 radiology residents, yielding 850 lesion interpretations. Lesions were described with use of 11 descriptors from the Breast Imaging Reporting and Data System, and interobserver variability was calculated with the Cohen κ statistic. Those 11 features were selected, along with patient age, and merged together by a linear discriminant analysis (LDA) classification model trained by using 1005 previously existing cases. Variability in the recommendations of the computer model for different observers was also calculated with the Cohen κ statistic. RESULTS A significant difference was observed for six lesion features, and radiology residents had greater interobserver variability in their selection of five of the six features than did dedicated breast imagers. The LDA model accurately classified lesions for both sets of observers (area under the receiver operating characteristic curve = 0.94 for residents and 0.96 for dedicated imagers). Sensitivity was maintained at 100% for residents and improved from 98% to 100% for dedicated breast imagers. For residents, the computer model could potentially improve the specificity from 20% to 40% (P < .01) and the κ value from 0.09 to 0.53 (P < .001). For dedicated breast imagers, the computer model could increase the specificity from 34% to 43% (P = .16) and the κ value from 0.21 to 0.61 (P < .001). CONCLUSION Among findings showing a significant difference, there was greater interobserver variability in lesion descriptions among residents; however, an LDA model using data from either dedicated breast imagers or residents yielded a consistently high performance in the differentiation of benign from malignant breast lesions, demonstrating potential for improving specificity and decreasing interobserver variability in biopsy recommendations.
Collapse
Affiliation(s)
- Swatee Singh
- Carl E. Ravin Advanced Imaging Laboratories, Duke University Medical Center, 2424 Erwin Rd, Ste 302, Durham, NC 27705, USA.
| | | | | | | | | |
Collapse
|
18
|
Artificial neural networks applied to cancer detection in a breast screening programme. ACTA ACUST UNITED AC 2010. [DOI: 10.1016/j.mcm.2010.03.019] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
19
|
Mancini F, Sousa FS, Hummel AD, Falcão AEJ, Yi LC, Ortolani CF, Sigulem D, Pisa IT. Classification of postural profiles among mouth-breathing children by learning vector quantization. Methods Inf Med 2010; 50:349-57. [PMID: 20871942 DOI: 10.3414/me09-01-0039] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2009] [Accepted: 04/27/2010] [Indexed: 11/09/2022]
Abstract
BACKGROUND Mouth breathing is a chronic syndrome that may bring about postural changes. Finding characteristic patterns of changes occurring in the complex musculoskeletal system of mouth-breathing children has been a challenge. Learning vector quantization (LVQ) is an artificial neural network model that can be applied for this purpose. OBJECTIVES The aim of the present study was to apply LVQ to determine the characteristic postural profiles shown by mouth-breathing children, in order to further understand abnormal posture among mouth breathers. METHODS Postural training data on 52 children (30 mouth breathers and 22 nose breathers) and postural validation data on 32 children (22 mouth breathers and 10 nose breathers) were used. The performance of LVQ and other classification models was compared in relation to self-organizing maps, back-propagation applied to multilayer perceptrons, Bayesian networks, naive Bayes, J48 decision trees, k, and k-nearest-neighbor classifiers. Classifier accuracy was assessed by means of leave-one-out cross-validation, area under ROC curve (AUC), and inter-rater agreement (Kappa statistics). RESULTS By using the LVQ model, five postural profiles for mouth-breathing children could be determined. LVQ showed satisfactory results for mouth-breathing and nose-breathing classification: sensitivity and specificity rates of 0.90 and 0.95, respectively, when using the training dataset, and 0.95 and 0.90, respectively, when using the validation dataset. CONCLUSIONS The five postural profiles for mouth-breathing children suggested by LVQ were incorporated into application software for classifying the severity of mouth breathers' abnormal posture.
Collapse
Affiliation(s)
- F Mancini
- Department of Health Informatics, Federal University of São Paulo (UNIFESP), São Paulo, Brazil
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Jamieson AR, Giger ML, Drukker K, Li H, Yuan Y, Bhooshan N. Exploring nonlinear feature space dimension reduction and data representation in breast Cadx with Laplacian eigenmaps and t-SNE. Med Phys 2010; 37:339-51. [PMID: 20175497 DOI: 10.1118/1.3267037] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE In this preliminary study, recently developed unsupervised nonlinear dimension reduction (DR) and data representation techniques were applied to computer-extracted breast lesion feature spaces across three separate imaging modalities: Ultrasound (U.S.) with 1126 cases, dynamic contrast enhanced magnetic resonance imaging with 356 cases, and full-field digital mammography with 245 cases. Two methods for nonlinear DR were explored: Laplacian eigenmaps [M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput. 15, 1373-1396 (2003)] and t-distributed stochastic neighbor embedding (t-SNE) [L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008)]. METHODS These methods attempt to map originally high dimensional feature spaces to more human interpretable lower dimensional spaces while preserving both local and global information. The properties of these methods as applied to breast computer-aided diagnosis (CADx) were evaluated in the context of malignancy classification performance as well as in the visual inspection of the sparseness within the two-dimensional and three-dimensional mappings. Classification performance was estimated by using the reduced dimension mapped feature output as input into both linear and nonlinear classifiers: Markov chain Monte Carlo based Bayesian artificial neural network (MCMC-BANN) and linear discriminant analysis. The new techniques were compared to previously developed breast CADx methodologies, including automatic relevance determination and linear stepwise (LSW) feature selection, as well as a linear DR method based on principal component analysis. Using ROC analysis and 0.632+bootstrap validation, 95% empirical confidence intervals were computed for the each classifier's AUC performance. RESULTS In the large U.S. data set, sample high performance results include, AUC0.632+ = 0.88 with 95% empirical bootstrap interval [0.787;0.895] for 13 ARD selected features and AUC0.632+ = 0.87 with interval [0.817;0.906] for four LSW selected features compared to 4D t-SNE mapping (from the original 81D feature space) giving AUC0.632+ = 0.90 with interval [0.847;0.919], all using the MCMC-BANN. CONCLUSIONS Preliminary results appear to indicate capability for the new methods to match or exceed classification performance of current advanced breast lesion CADx algorithms. While not appropriate as a complete replacement of feature selection in CADx problems, DR techniques offer a complementary approach, which can aid elucidation of additional properties associated with the data. Specifically, the new techniques were shown to possess the added benefit of delivering sparse lower dimensional representations for visual interpretation, revealing intricate data structure of the feature space.
Collapse
Affiliation(s)
- Andrew R Jamieson
- Department of Radiology, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | | | | | |
Collapse
|
21
|
Oommen BJ, Fayyoumi E. On utilizing association and interaction concepts for enhancing microaggregation in secure statistical databases. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. PART B, CYBERNETICS : A PUBLICATION OF THE IEEE SYSTEMS, MAN, AND CYBERNETICS SOCIETY 2010; 40:198-207. [PMID: 19643708 DOI: 10.1109/tsmcb.2009.2024949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This paper presents a possibly pioneering endeavor to tackle the Microaggregation Techniques (MATs) in secure statistical databases by resorting to the principles of associative neural networks (NNs). The prior art has improved the available solutions to the MAT by incorporating proximity information, and this approach is done by recursively reducing the size of the data set by excluding points that are farthest from the centroid and points that are closest to these farthest points. Thus, although the method is extremely effective, arguably, it uses only the proximity information while ignoring the mutual interaction between the records. In this paper, we argue that interrecord relationships can be quantified in terms of the following two entities: 1) their "association" and 2) their "interaction." This case means that records that are not necessarily close to each other may still be "grouped," because their mutual interaction, which is quantified by invoking transitive-closure-like operations on the latter entity, could be significant, as suggested by the theoretically sound principles of NNs. By repeatedly invoking the interrecord associations and interactions, the records are grouped into sizes of cardinality " k," where k is the security parameter in the algorithm. Our experimental results, which are done on artificial data and benchmark real-life data sets, demonstrate that the newly proposed method is superior to the state of the art not only based on the Information Loss (IL) perspective but also when it concerns a criterion that involves a combination of the IL and the Disclosure Risk (DR).
Collapse
Affiliation(s)
- B John Oommen
- School of Computer Science, Carleton University, Ottawa, ON K1S 5B6, Canada.
| | | |
Collapse
|
22
|
Murty US, Srinivasa Rao M, Misra S. Prioritization of malaria endemic zones using self-organizing maps in the Manipur state of India. Inform Health Soc Care 2009; 33:170-8. [DOI: 10.1080/17538150802457687] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
23
|
Oprea AE, Strungaru R, Ungureanu GM. A Self Organizing Map approach to breast cancer detection. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2008:3032-5. [PMID: 19163345 DOI: 10.1109/iembs.2008.4649842] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Detection and characterization of cancer tumors in mammograms is vital in daily clinical practice. The problem of detecting possible cancer areas is very complex due, on one hand, to the diversity in shape of the ill tissue and on the other hand to the poorly defined border between the healthy and the cancerous zone. Even though it has been studied for many years, there are still remaining challenges and directions for future research such as developing better enhancement and segmentation algorithms. The performance of the Self Organizing Map (SOM) in detecting the cancer suspicious regions in digitized mammograms is revealed in this study. In order to achieve the best results we firstly apply the preprocessing algorithms proposed in section II of the study.
Collapse
Affiliation(s)
- Alina E Oprea
- Applied Electronics and Information Engineering Departement, Politehnica University of Bucharest, Bucharest, Romania.
| | | | | |
Collapse
|
24
|
Sampat MP, Patel AC, Wang Y, Gupta S, Kan CW, Bovik AC, Markey MK. Indexes for three-class classification performance assessment--an empirical comparison. ACTA ACUST UNITED AC 2009; 13:300-12. [PMID: 19171528 DOI: 10.1109/titb.2008.2009440] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Assessment of classifier performance is critical for fair comparison of methods, including considering alternative models or parameters during system design. The assessment must not only provide meaningful data on the classifier efficacy, but it must do so in a concise and clear manner. For two-class classification problems, receiver operating characteristic analysis provides a clear and concise assessment methodology for reporting performance and comparing competing systems. However, many other important biomedical questions cannot be posed as "two-class" classification tasks and more than two classes are often necessary. While several methods have been proposed for assessing the performance of classifiers for such multiclass problems, none has been widely accepted. The purpose of this paper is to critically review methods that have been proposed for assessing multiclass classifiers. A number of these methods provide a classifier performance index called the volume under surface (VUS). Empirical comparisons are carried out using 4 three-class case studies, in which three popular classification techniques are evaluated with these methods. Since the same classifier was assessed using multiple performance indexes, it is possible to gain insight into the relative strengths and weakness of the measures. We conclude that: 1) the method proposed by Scurfield provides the most detailed description of classifier performance and insight about the sources of error in a given classification task and 2) the methods proposed by He and Nakas also have great practical utility as they provide both the VUS and an estimate of the variance of the VUS. These estimates can be used to statistically compare two classification algorithms.
Collapse
Affiliation(s)
- Mehul P Sampat
- Center for Neurological Imaging, Department of Radiology, Brigham and Women's Hospital, Boston, MA 02115, USA.
| | | | | | | | | | | | | |
Collapse
|
25
|
Li Q, Li F, Doi K. Computerized detection of lung nodules in thin-section CT images by use of selective enhancement filters and an automated rule-based classifier. Acad Radiol 2008; 15:165-75. [PMID: 18206615 DOI: 10.1016/j.acra.2007.09.018] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Revised: 08/20/2007] [Accepted: 09/21/2007] [Indexed: 11/16/2022]
Abstract
RATIONALE AND OBJECTIVES We have been developing a computer-aided diagnostic (CAD) scheme for lung nodule detection in order to assist radiologists in the detection of lung cancer in thin-section computed tomography (CT) images. MATERIALS AND METHODS Our database consisted of 117 thin-section CT scans with 153 nodules, obtained from a lung cancer screening program at a Japanese university (85 scans, 91 nodules) and from clinical work at an American university (32 scans, 62 nodules). The database included nodules of different sizes (4-28 mm, mean 10.2 mm), shapes, and patterns (solid and ground-glass opacity (GGO)). Our CAD scheme consisted of modules for lung segmentation, selective nodule enhancement, initial nodule detection, feature extraction, and classification. The selective nodule enhancement filter was a key technique for significant enhancement of nodules and suppression of normal anatomic structures such as blood vessels, which are the main sources of false positives. Use of an automated rule-based classifier for reduction of false positives was another key technique; it resulted in a minimized overtraining effect and an improved classification performance. We used a case-based four-fold cross-validation testing method for evaluation of the performance levels of our computerized detection scheme. RESULTS Our CAD scheme achieved an overall sensitivity of 86% (small: 76%, medium-sized: 94%, large: 95%; solid: 86%, mixed GGO: 89%, pure GGO: 81%) with 6.6 false positives per scan; an overall sensitivity of 81% (small: 69%, medium-sized: 91%, large: 91%; solid: 79%, mixed GGO: 88%, pure GGO: 81%) with 3.3 false positives per scan; and an overall sensitivity of 75% (small: 60%, medium-sized: 88%, large: 87%; solid: 70%, mixed GGO: 87%, pure GGO: 81%) with 1.6 false positives per scan. CONCLUSION The experimental results indicate that our CAD scheme with its two key techniques can achieve a relatively high performance for nodules presenting large variations in size, shape, and pattern.
Collapse
Affiliation(s)
- Qiang Li
- Department of Radiology, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
26
|
Kouskoumvekaki I, Yang Z, Jónsdóttir SO, Olsson L, Panagiotou G. Identification of biomarkers for genotyping Aspergilli using non-linear methods for clustering and classification. BMC Bioinformatics 2008; 9:59. [PMID: 18226195 PMCID: PMC2248563 DOI: 10.1186/1471-2105-9-59] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2007] [Accepted: 01/28/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the present investigation, we have used an exhaustive metabolite profiling approach to search for biomarkers in recombinant Aspergillus nidulans (mutants that produce the 6- methyl salicylic acid polyketide molecule) for application in metabolic engineering. RESULTS More than 450 metabolites were detected and subsequently used in the analysis. Our approach consists of two analytical steps of the metabolic profiling data, an initial non-linear unsupervised analysis with Self-Organizing Maps (SOM) to identify similarities and differences among the metabolic profiles of the studied strains, followed by a second, supervised analysis for training a classifier based on the selected biomarkers. Our analysis identified seven putative biomarkers that were able to cluster the samples according to their genotype. A Support Vector Machine was subsequently employed to construct a predictive model based on the seven biomarkers, capable of distinguishing correctly 14 out of the 16 samples of the different A. nidulans strains. CONCLUSION Our study demonstrates that it is possible to use metabolite profiling for the classification of filamentous fungi as well as for the identification of metabolic engineering targets and draws the attention towards the development of a common database for storage of metabolomics data.
Collapse
Affiliation(s)
- Irene Kouskoumvekaki
- Center for Microbial Biotechnology, BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs Lyngby, Denmark.
| | | | | | | | | |
Collapse
|
27
|
Verma B. Novel network architecture and learning algorithm for the classification of mass abnormalities in digitized mammograms. Artif Intell Med 2008; 42:67-79. [DOI: 10.1016/j.artmed.2007.09.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2007] [Revised: 09/18/2007] [Accepted: 09/27/2007] [Indexed: 10/22/2022]
|
28
|
Chen S, Zhou S, Yin FF, Marks LB, Das SK. Using patient data similarities to predict radiation pneumonitis via a self-organizing map. Phys Med Biol 2007; 53:203-16. [PMID: 18182697 DOI: 10.1088/0031-9155/53/1/014] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
This work investigates the use of the self-organizing map (SOM) technique for predicting lung radiation pneumonitis (RP) risk. SOM is an effective method for projecting and visualizing high-dimensional data in a low-dimensional space (map). By projecting patients with similar data (dose and non-dose factors) onto the same region of the map, commonalities in their outcomes can be visualized and categorized. Once built, the SOM may be used to predict pneumonitis risk by identifying the region of the map that is most similar to a patient's characteristics. Two SOM models were developed from a database of 219 lung cancer patients treated with radiation therapy (34 clinically diagnosed with Grade 2+ pneumonitis). The models were: SOM(all) built from all dose and non-dose factors and, for comparison, SOM(dose) built from dose factors alone. Both models were tested using ten-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Models SOM(all) and SOM(dose) yielded ten-fold cross-validated ROC areas of 0.73 (sensitivity/specificity = 71%/68%) and 0.67 (sensitivity/specificity = 63%/66%), respectively. The significant difference between the cross-validated ROC areas of these two models (p < 0.05) implies that non-dose features add important information toward predicting RP risk. Among the input features selected by model SOM(all), the two with highest impact for increasing RP risk were: (a) higher mean lung dose and (b) chemotherapy prior to radiation therapy. The SOM model developed here may not be extrapolated to treatment techniques outside that used in our database, such as several-field lung intensity modulated radiation therapy or gated radiation therapy.
Collapse
Affiliation(s)
- Shifeng Chen
- Department of Radiation Oncology, Duke University Medical Center, Durham, NC 27710, USA.
| | | | | | | | | |
Collapse
|
29
|
Fischer EA, Lo JY, Markey MK. Bayesian networks of BI-RADStrade mark descriptors for breast lesion classification. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:3031-4. [PMID: 17270917 DOI: 10.1109/iembs.2004.1403858] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We investigated Bayesian network structure learning and probability estimation from mammographic feature data in order to classify breast lesions into different pathological categories. We compared the learned networks to naive Bayes classifiers, which are similar to the expert systems previously investigated for breast lesion classification. The learned network structures reflect the difference in the classification of biopsy outcome and the invasiveness of malignant lesions for breast masses and microcalcifications. The difference between masses and microcalcifications should be taken into consideration when interpreting systems for automatic pathological classification of breast lesions. The difference may also affect use of these systems for tasks such as estimating the sampling error of biopsy.
Collapse
Affiliation(s)
- E A Fischer
- Dept. of Biomed. Eng., Texas Univ., Austin, TX, USA
| | | | | |
Collapse
|
30
|
Markey MK, Tourassi GD, Margolis M, DeLong DM. Impact of missing data in evaluating artificial neural networks trained on complete data. Comput Biol Med 2006; 36:516-25. [PMID: 15893745 DOI: 10.1016/j.compbiomed.2005.02.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2004] [Accepted: 02/17/2005] [Indexed: 11/30/2022]
Abstract
This study investigated the impact of missing data in the evaluation of artificial neural network (ANN) models trained on complete data for the task of predicting whether breast lesions are benign or malignant from their mammographic Breast Imaging and Reporting Data System (BI-RADS) descriptors. A feed-forward, back-propagation ANN was tested with three methods for estimating the missing values. Similar results were achieved with a constraint satisfaction ANN, which can accommodate missing values without a separate estimation step. This empirical study highlights the need for additional research on developing robust clinical decision support systems for realistic environments in which key information may be unknown or inaccessible.
Collapse
Affiliation(s)
- Mia K Markey
- Biomedical Engineering Department, The University of Texas at Austin, 1 University Station, C0800, ENS617B, Austin, TX 78712, USA.
| | | | | | | |
Collapse
|
31
|
Gupta S, Chyn PF, Markey MK. Breast cancer CADx based on BI-RAds descriptors from two mammographic views. Med Phys 2006; 33:1810-7. [PMID: 16872088 DOI: 10.1118/1.2188080] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
In this study we compared the performance of computer aided diagnosis (CADx) algorithms based on Breast Imaging Reporting And Data System (BI-RADS) descriptors from one or two views. To select cases for the study with different mediolateral (MLO) and craniocaudal (CC) view descriptors, we assessed the agreement in BI-RADS lesion descriptors, BI-RADS assessment, and subtlety ratings for 1626 cases from the Digital Database for Screening Mammogrpahy (DDSM) using kappa statistics. We used 115 mass caseswith different descriptors for the two views to design linear discriminant analysis (LDA) based CADx algorithms. The CADx algorithms used BI-RADS descriptors and patient age as features. Thealgorithms based on BI-RADS descriptors from both the views performed marginally betterthan algorithms based on BI-RADS descriptors from a single view. A system that averaged theresults of two classifiers trained separately on the MLO and CC views displayed the best performance (Az=0.920 +/- 0.027). Thus, some improvement in performance of BI-RADS based CADx algorithms may be achieved by combining information from two mammographic views.
Collapse
Affiliation(s)
- Shalini Gupta
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712, USA
| | | | | |
Collapse
|
32
|
Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2006; 37:296-304. [PMID: 16620802 DOI: 10.1016/j.compbiomed.2006.02.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2005] [Revised: 02/15/2006] [Accepted: 02/15/2006] [Indexed: 10/24/2022]
Abstract
Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia--in which the ICD9-CM coding was found to be very poor--the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62).
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | |
Collapse
|
33
|
Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 2006; 39:697-705. [PMID: 16554186 DOI: 10.1016/j.jbi.2006.01.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2005] [Revised: 12/24/2005] [Accepted: 01/28/2006] [Indexed: 10/25/2022]
Abstract
BACKGROUND Patient-based similarity metrics are important case-based reasoning tools which may assist with research and patient care applications. Ontology and information content principles may be potentially helpful tools for similarity metric development. METHODS Patient cases from 1989 through 2003 from the Columbia University Medical Center data repository were converted to SNOMED CT concepts. Five metrics were implemented: (1) percent disagreement with data as an unstructured "bag of findings," (2) average links between concepts, (3) links weighted by information content with descendants, (4) links weighted by information content with term prevalence, and (5) path distance using descendants weighted by information content with descendants. Three physicians served as gold standard for 30 cases. RESULTS Expert inter-rater reliability was 0.91, with rank correlations between 0.61 and 0.81, representing upper-bound performance. Expert performance compared to metrics resulted in correlations of 0.27, 0.29, 0.30, 0.30, and 0.30, respectively. Using SNOMED axis Clinical Findings alone increased correlation to 0.37. CONCLUSION Ontology principles and information content provide useful information for similarity metrics but currently fall short of expert performance.
Collapse
|
34
|
Bhalla R, Narasimhan K, Swarup S. Metabolomics and its role in understanding cellular responses in plants. PLANT CELL REPORTS 2005; 24:562-71. [PMID: 16220342 DOI: 10.1007/s00299-005-0054-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2005] [Revised: 07/22/2005] [Accepted: 07/25/2005] [Indexed: 05/04/2023]
Abstract
A natural shift is taking place in the approaches being adopted by plant scientists in response to the accessibility of systems-based technology platforms. Metabolomics is one such field, which involves a comprehensive non-biased analysis of metabolites in a given cell at a specific time. This review briefly introduces the emerging field and a range of analytical techniques that are most useful in metabolomics when combined with computational approaches in data analyses. Using cases from Arabidopsis and other selected plant systems, this review highlights how information can be integrated from metabolomics and other functional genomics platforms to obtain a global picture of plant cellular responses. We discuss how metabolomics is enabling large-scale and parallel interrogation of cell states under different stages of development and defined environmental conditions to uncover novel interactions among various pathways. Finally, we discuss selected applications of metabolomics.
Collapse
Affiliation(s)
- Ritu Bhalla
- Temasek Life Sciences Laboratory, National University of Singapore, Singapore, Malaysia
| | | | | |
Collapse
|
35
|
Nattkemper TW, Wismüller A. Tumor feature visualization with unsupervised learning. Med Image Anal 2005; 9:344-51. [PMID: 15907392 DOI: 10.1016/j.media.2005.01.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2004] [Revised: 09/30/2004] [Accepted: 01/27/2005] [Indexed: 11/15/2022]
Abstract
Dynamic contrast enhanced magnetic resonance imaging (DCE MRI) is applied for diagnosis and therapy control of breast cancer. The malignancy of a lesion is expressed in the average signal kinetics of selected regions of interest (ROI) representing the lesion. The technique is reported to characterize malignant tumors with high sensitivity and highly variable specificity. Computer-based diagnosis (CAD) systems have been proposed to analyze and classify signal time curve data, extracted from hand selected ROI in the DCE MRI data. In this paper, we apply the self-organizing map (SOM) to a set of time curve feature vectors of single voxels from seven benign lesions and seven malignant tumors. Applying the SOM we are able to project the time curve values of each voxel on a two-dimensional map. The results show, that the SOM is able to visualize the hidden two-dimensional structure of the six-dimensional signal space. Using the trained SOM, we are able to identify voxels with benign or malignant signal characteristics and to visualize lesion cross-sections with pseudo-colors. A comparison with the established three time points method shows that the SOM has clear potential for deriving visualization parameters in DCE MRI analysis.
Collapse
Affiliation(s)
- Tim W Nattkemper
- Applied Neuroinformatics Group, Faculty of Technology, Bielefeld University, P.O. Box 100131, D-33501 Bielefeld, Germany.
| | | |
Collapse
|
36
|
Nattkemper TW, Arnrich B, Lichte O, Timm W, Degenhard A, Pointon L, Hayes C, Leach MO. Evaluation of radiological features for breast tumour classification in clinical screening with machine learning methods. Artif Intell Med 2004; 34:129-39. [PMID: 15894177 DOI: 10.1016/j.artmed.2004.09.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2004] [Revised: 09/02/2004] [Accepted: 09/27/2004] [Indexed: 12/18/2022]
Abstract
OBJECTIVE In this work, methods utilizing supervised and unsupervised machine learning are applied to analyze radiologically derived morphological and calculated kinetic tumour features. The features are extracted from dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) time-course data. MATERIAL The DCE-MRI data of the female breast are obtained within the UK Multicenter Breast Screening Study. The group of patients imaged in this study is selected on the basis of an increased genetic risk for developing breast cancer. METHODS The k-means clustering and self-organizing maps (SOM) are applied to analyze the signal structure in terms of visualization. We employ k-nearest neighbor classifiers (k-nn), support vector machines (SVM) and decision trees (DT) to classify features using a computer aided diagnosis (CAD) approach. RESULTS Regarding the unsupervised techniques, clustering according to features indicating benign and malignant characteristics is observed to a limited extend. The supervised approaches classified the data with 74% accuracy (DT) and providing an area under the receiver-operator-characteristics (ROC) curve (AUC) of 0.88 (SVM). CONCLUSION It was found that contour and wash-out type (WOT) features determined by the radiologists lead to the best SVM classification results. Although a fast signal uptake in early time-point measurements is an important feature for malignant/benign classification of tumours, our results indicate that the wash-out characteristics might be considered as important.
Collapse
Affiliation(s)
- Tim W Nattkemper
- Applied Neuroinformatics Group, Bielefeld University, P.O. Box 100130, D-33501 Bielefeld, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Wyns B, Sette S, Boullart L, Baeten D, Hoffman IEA, De Keyser F. Prediction of diagnosis in patients with early arthritis using a combined Kohonen mapping and instance-based evaluation criterion. Artif Intell Med 2004; 31:45-55. [PMID: 15182846 DOI: 10.1016/j.artmed.2004.01.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2003] [Revised: 08/08/2003] [Accepted: 01/16/2004] [Indexed: 12/23/2022]
Abstract
Rheumatoid arthritis (RA) and spondyloarthropathy (SpA) are the two most frequent forms of chronic autoimmune arthritis. These diseases lead to important inflammatory symptoms resulting in an important functional impairment. This paper introduces a self-organizing artificial neural network combined with a case-based reasoning evaluation criterion to predict diagnosis in patients with early arthritis. Results show that 47.2% of the sample space can be predicted with an accuracy of 84.0% and attaining a high confidence level. 37.7% of the sample space is classified with an overall accuracy of 65.0%. The remaining group was labeled as "undetermined". A general prediction accuracy of 75.6% is reached, exceeding the performance of other approaches such as a backpropagation neural network and the Quest decision tree program. Furthermore, by using this new method, more specifically case-based reasoning, as a helpful tool to classify patients with early arthritis, the possibility of a confidence measure is given, indicating a degree of "belief" of the system in its results. This is often an important feature when dealing with diagnosis in human patients.
Collapse
Affiliation(s)
- B Wyns
- Department of Electrical Energy, Systems and Automation, Faculty of Applied Sciences, Ghent University, Technologiepark 913, 9000 Ghent, Belgium.
| | | | | | | | | | | |
Collapse
|