1
|
Yi Z, He E, Yang P, Wang Z, Hu X, Feng Y. Artificial neural network prediction of postoperative complications in papillary thyroid microcarcinoma based on preoperative ultrasonographic features. JOURNAL OF CLINICAL ULTRASOUND : JCU 2024. [PMID: 39189355 DOI: 10.1002/jcu.23800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/30/2024] [Accepted: 08/07/2024] [Indexed: 08/28/2024]
Abstract
OBJECTIVE To predict post-thyroidectomy complications in papillary thyroid microcarcinoma (PTMC) patients using a deep learning model based on preoperative ultrasonographic features. This study addresses the global rise in PTMC incidence and the challenges in treatment decision-making with high-resolution ultrasonography. METHOD This study enrolled 1638 patients with clinically staged cN0 PTMC who received surgical treatment from 1997 to 2019 at Beijing Friendship Hospital. Deep learning model was developed using fully connected neural network. Feature selection included 1000 iterations of Bootstrap sampling and Recursive Feature Elimination (RFE) to identify the top 10 features. Data preprocessing involved normalization and imputation for missing values. SMOTE addressed class imbalance. The model was trained and tested on random data split, with performance metrics including Accuracy (ACC), Area Under the Curve (AUC), Sensitivity (SEN), and Specificity (SPE), visualized through a ROC curve and confusion matrix. RESULTS The fully connected deep neural network model demonstrated high accuracy (ACC 0.81), Area Under the Curve (AUC 0.74), sensitivity (SEN 0.65), and specificity (SPE 0.83) and visualized by ROC curve and confusion matrix. These results highlight the model's reliability and potential as an effective tool in predicting postoperative complications and assisting in clinical decision-making for PTMC patients. CONCLUSION This study highlights the potential of deep learning in enhancing medical predictions and personalized healthcare. Despite promising results, limitations include a single-center data source and unconsidered factors like lifestyle and genetics. Future research should expand data sources, include more influencing factors, and refine algorithms to improve accuracy and applicability in thyroid cancer treatment.
Collapse
Affiliation(s)
- Zhanxiong Yi
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Enhui He
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Peipei Yang
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Zhixiang Wang
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Xiangdong Hu
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Ying Feng
- Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
2
|
Mahawan T, Luckett T, Mielgo Iza A, Pornputtapong N, Caamaño Gutiérrez E. Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis. BMC Med Inform Decis Mak 2024; 24:175. [PMID: 38902676 PMCID: PMC11191155 DOI: 10.1186/s12911-024-02578-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 06/14/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources. METHODS We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA. RESULTS We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub. CONCLUSION This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path.
Collapse
Affiliation(s)
- Tanakamol Mahawan
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Biochemistry & System Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
- Akkhraratchakumari Veterinary College, Walailak University, Nakhon Si Thammarat, Thailand
| | - Teifion Luckett
- Department of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Ainhoa Mielgo Iza
- Department of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Natapol Pornputtapong
- Department of Biochemistry and Microbiology, Faculty of Pharmaceutical Sciences, and Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Eva Caamaño Gutiérrez
- Department of Biochemistry & System Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
- Computational Biology Facility, LIV-SRF, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, UK.
| |
Collapse
|
3
|
Lou W, Bonfatti V, Bovenhuis H, Shi R, van der Linden A, Mulder HA, Liu L, Wang Y, Ducro B. Prediction of likelihood of conception in dairy cows using milk mid-infrared spectra collected before the first insemination and machine learning algorithms. J Dairy Sci 2024:S0022-0302(24)00850-6. [PMID: 38825141 DOI: 10.3168/jds.2023-24621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 04/15/2024] [Indexed: 06/04/2024]
Abstract
Accurate and ex-ante prediction of cows' likelihood of conception (LC) based on milk composition information could improve reproduction management on dairy farms. Milk composition is already routinely measured by mid-infrared (MIR) spectra, which are known to change with advancing stages of pregnancy. For lactating cows, MIR spectra may also be used for predicting the LC. Our objectives were to classify the LC at first insemination using milk MIR spectra data collected from calving to first insemination and to identify the spectral regions that contribute the most to the prediction of LC at first insemination. After quality control, 4,866 MIR spectra, milk production, and reproduction records from 3,451 Holstein cows were used. The classification accuracy and area under the curve (AUC) of 6 models comprising different predictors and 3 machine learning methods were estimated and compared. The results showed that partial least square discriminant analysis (PLS-DA) and random forest had higher prediction accuracies than logistic regression. The classification accuracy of good and poor LC cows and AUC in herd-by-herd validation of the best model were 76.35 ± 10.60% and 0.77 ± 0.11, respectively. All wavenumbers with values of variable importance in the projection higher than 1.00 in PLS-DA belonged to 3 spectral regions, namely from 1,003 to 1,189, 1,794 to 2,260, and 2,300 to 2,660 cm-1. In conclusion, the model can predict LC in dairy cows from a high productive TMR system before insemination with a relatively good accuracy, allowing farmers to intervene in advance or adjust the insemination schedule for cows with a poor predicted LC.
Collapse
Affiliation(s)
- W Lou
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory of Animal Breeding, State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China; Wageningen University & Research, Animal Breeding and Genomics, P.O. Box 338, 6700 AH Wageningen, the Netherlands; Wageningen University & Research, Animal Production Systems, P.O. Box 338, 6700 AH Wageningen, the Netherlands
| | - V Bonfatti
- Department of Comparative Biomedicine and Food Science, University of Padova, Legnaro, 35020, Italy.
| | - H Bovenhuis
- Wageningen University & Research, Animal Breeding and Genomics, P.O. Box 338, 6700 AH Wageningen, the Netherlands
| | - R Shi
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory of Animal Breeding, State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China; Wageningen University & Research, Animal Breeding and Genomics, P.O. Box 338, 6700 AH Wageningen, the Netherlands; Wageningen University & Research, Animal Production Systems, P.O. Box 338, 6700 AH Wageningen, the Netherlands
| | - A van der Linden
- Wageningen University & Research, Animal Production Systems, P.O. Box 338, 6700 AH Wageningen, the Netherlands
| | - H A Mulder
- Wageningen University & Research, Animal Breeding and Genomics, P.O. Box 338, 6700 AH Wageningen, the Netherlands
| | - L Liu
- Beijing Dairy Cattle Center, Beijing, 100192, China
| | - Y Wang
- Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture of China, National Engineering Laboratory of Animal Breeding, State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - B Ducro
- Wageningen University & Research, Animal Breeding and Genomics, P.O. Box 338, 6700 AH Wageningen, the Netherlands
| |
Collapse
|
4
|
Hassan A, Gulzar Ahmad S, Ullah Munir E, Ali Khan I, Ramzan N. Predictive modelling and identification of key risk factors for stroke using machine learning. Sci Rep 2024; 14:11498. [PMID: 38769427 PMCID: PMC11106277 DOI: 10.1038/s41598-024-61665-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
Strokes are a leading global cause of mortality, underscoring the need for early detection and prevention strategies. However, addressing hidden risk factors and achieving accurate prediction become particularly challenging in the presence of imbalanced and missing data. This study encompasses three imputation techniques to deal with missing data. To tackle data imbalance, it employs the synthetic minority oversampling technique (SMOTE). The study initiates with a baseline model and subsequently employs an extensive range of advanced models. This study thoroughly evaluates the performance of these models by employing k-fold cross-validation on various imbalanced and balanced datasets. The findings reveal that age, body mass index (BMI), average glucose level, heart disease, hypertension, and marital status are the most influential features in predicting strokes. Furthermore, a Dense Stacking Ensemble (DSE) model is built upon previous advanced models after fine-tuning, with the best-performing model as a meta-classifier. The DSE model demonstrated over 96% accuracy across diverse datasets, with an AUC score of 83.94% on imbalanced imputed dataset and 98.92% on balanced one. This research underscores the remarkable performance of the DSE model, compared to the previous research on the same dataset. It highlights the model's potential for early stroke detection to improve patient outcomes.
Collapse
Affiliation(s)
- Ahmad Hassan
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan
| | - Saima Gulzar Ahmad
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan
| | - Ehsan Ullah Munir
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Grand Trunk Road, Wah, 47010, Pakistan
| | - Imtiaz Ali Khan
- Department of Computer Science, Cardiff School of Technologies, Llandaff Campus, Western Avenue, Cardiff, CF5 2YB, UK
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley, PA1 2BE, UK.
| |
Collapse
|
5
|
Yurkovich JT, Evans SJ, Rappaport N, Boore JL, Lovejoy JC, Price ND, Hood LE. The transition from genomics to phenomics in personalized population health. Nat Rev Genet 2024; 25:286-302. [PMID: 38093095 DOI: 10.1038/s41576-023-00674-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2023] [Indexed: 03/21/2024]
Abstract
Modern health care faces several serious challenges, including an ageing population and its inherent burden of chronic diseases, rising costs and marginal quality metrics. By assessing and optimizing the health trajectory of each individual using a data-driven personalized approach that reflects their genetics, behaviour and environment, we can start to address these challenges. This assessment includes longitudinal phenome measures, such as the blood proteome and metabolome, gut microbiome composition and function, and lifestyle and behaviour through wearables and questionnaires. Here, we review ongoing large-scale genomics and longitudinal phenomics efforts and the powerful insights they provide into wellness. We describe our vision for the transformation of the current health care from disease-oriented to data-driven, wellness-oriented and personalized population health.
Collapse
Affiliation(s)
- James T Yurkovich
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA
| | - Simon J Evans
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
| | - Noa Rappaport
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
- Institute for Systems Biology, Seattle, WA, USA
| | - Jeffrey L Boore
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
| | - Jennifer C Lovejoy
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
- Institute for Systems Biology, Seattle, WA, USA
| | - Nathan D Price
- Institute for Systems Biology, Seattle, WA, USA
- Thorne HealthTech, New York, NY, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Leroy E Hood
- Phenome Health, Seattle, WA, USA.
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA.
- Institute for Systems Biology, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
- Department of Immunology, University of Washington, Seattle, WA, USA.
| |
Collapse
|
6
|
Haghish EF, Nes RB, Obaidi M, Qin P, Stänicke LI, Bekkhus M, Laeng B, Czajkowski N. Unveiling Adolescent Suicidality: Holistic Analysis of Protective and Risk Factors Using Multiple Machine Learning Algorithms. J Youth Adolesc 2024; 53:507-525. [PMID: 37982927 PMCID: PMC10838236 DOI: 10.1007/s10964-023-01892-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/17/2023] [Indexed: 11/21/2023]
Abstract
Adolescent suicide attempts are on the rise, presenting a significant public health concern. Recent research aimed at improving risk assessment for adolescent suicide attempts has turned to machine learning. But no studies to date have examined the performance of stacked ensemble algorithms, which are more suitable for low-prevalence conditions. The existing machine learning-based research also lacks population-representative samples, overlooks protective factors and their interplay with risk factors, and neglects established theories on suicidal behavior in favor of purely algorithmic risk estimation. The present study overcomes these shortcomings by comparing the performance of a stacked ensemble algorithm with a diverse set of algorithms, performing a holistic item analysis to identify both risk and protective factors on a comprehensive data, and addressing the compatibility of these factors with two competing theories of suicide, namely, The Interpersonal Theory of Suicide and The Strain Theory of Suicide. A population-representative dataset of 173,664 Norwegian adolescents aged 13 to 18 years (mean = 15.14, SD = 1.58, 50.5% female) with a 4.65% rate of reported suicide attempt during the past 12 months was analyzed. Five machine learning algorithms were trained for suicide attempt risk assessment. The stacked ensemble model significantly outperformed other algorithms, achieving equal sensitivity and a specificity of 90.1%, AUC of 96.4%, and AUCPR of 67.5%. All algorithms found recent self-harm to be the most important indicator of adolescent suicide attempt. Exploratory factor analysis suggested five additional risk domains, which we labeled internalizing problems, sleep disturbance, disordered eating, lack of optimism regarding future education and career, and victimization. The identified factors provided stronger support for The Interpersonal Theory of Suicide than for The Strain Theory of Suicide. An enhancement to The Interpersonal Theory based on the risk and protective factors identified by holistic item analysis is presented.
Collapse
Affiliation(s)
- E F Haghish
- Department of Psychology, University of Oslo, Oslo, Norway.
| | - Ragnhild Bang Nes
- Department of Mental Health and Suicide, Norwegian Institute of Public Health, Oslo, Norway
- Promenta Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| | - Milan Obaidi
- Department of Psychology, University of Oslo, Oslo, Norway
- Department of Psychology, Copenhagen University, Copenhagen, Denmark
| | - Ping Qin
- National Centre for Suicide Research and Prevention, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Line Indrevoll Stänicke
- Department of Psychology, University of Oslo, Oslo, Norway
- Nic Waals Institute, Lovisenberg hospital, Oslo, Norway
| | - Mona Bekkhus
- Promenta Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| | - Bruno Laeng
- Department of Psychology, University of Oslo, Oslo, Norway
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
| | - Nikolai Czajkowski
- Department of Mental Health and Suicide, Norwegian Institute of Public Health, Oslo, Norway
- Promenta Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
7
|
Yang J, Hao Z, Xu J, Wang J, Jiang X. Fusion machine learning model predicts CAD-CAM ceramic colors and the corresponding minimal thicknesses over various clinical backgrounds. Dent Mater 2024; 40:285-296. [PMID: 37996303 DOI: 10.1016/j.dental.2023.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/13/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
OBJECTIVES This study has developed and optimized a machine learning model to accurately predict the final colors of CAD-CAM ceramics and determine their required minimum thicknesses to cover different clinical backgrounds. METHODS A total of 120 ceramic specimens (2 mm, 1 mm and 0.5 mm thickness; n = 10) of four CAD-CAM ceramics - IPS e.max, IPS ZirCAD, Upcera Li CAD and Upcera TT CAD - were studied. The CIELab coordinates (L*, a* and b*) of each specimen were obtained over seven different clinical backgrounds (A1, A2, A3.5, ND2, ND7, cobalt-chromium alloy (CC) and medium precious alloy (MPA)) using a digital spectrophotometer. The color difference (ΔE) and lightness difference (ΔL) results were submitted to 39 different models. The prediction results from the top-performing models were used to develop a fusion model via the Stacking integrated learning method for best-fitting prediction. The SHapley Additive exPlanation (SHAP) was performed to interpret the feature importance. RESULTS The fusion model, which combined the ExtraTreesRegressor (ET) and XGBRegressor (XGB) models, demonstrated minimal prediction errors (R2 = 0.9) in the external testing sets. Among the investigated variables, thickness and background colors (CC and MPA) majorly influenced the final color of restoration. To achieve perfect aesthetic restoration (ΔE<2.6), at least 1.9 mm IPS ZirCAD or 1.6 mm Upcera TT CAD were required to cover the CC background, while two tested glass-ceramics did not meet the requirements even with thicknesses over 2 mm. SIGNIFICANCE The fusion model provided a promising tool for automate decision-making in material selection with minimal thickness over various clinical background.
Collapse
Affiliation(s)
- Jiawei Yang
- Department of Prosthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; College of Stomatology, Shanghai Jiao Tong University, Shanghai, China; National Center for Stomatology, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology, Shanghai Research Institute of Stomatology, Shanghai Engineering Research Center of Advanced Dental Technology and Materials, Shanghai, China
| | - Zezhou Hao
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Jiani Xu
- Department of Prosthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; College of Stomatology, Shanghai Jiao Tong University, Shanghai, China; National Center for Stomatology, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology, Shanghai Research Institute of Stomatology, Shanghai Engineering Research Center of Advanced Dental Technology and Materials, Shanghai, China
| | - Jie Wang
- Department of Prosthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; College of Stomatology, Shanghai Jiao Tong University, Shanghai, China; National Center for Stomatology, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology, Shanghai Research Institute of Stomatology, Shanghai Engineering Research Center of Advanced Dental Technology and Materials, Shanghai, China.
| | - Xinquan Jiang
- Department of Prosthodontics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; College of Stomatology, Shanghai Jiao Tong University, Shanghai, China; National Center for Stomatology, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology, Shanghai Research Institute of Stomatology, Shanghai Engineering Research Center of Advanced Dental Technology and Materials, Shanghai, China.
| |
Collapse
|
8
|
Ejiyi CJ, Qin Z, Monday H, Ejiyi MB, Ukwuoma C, Ejiyi TU, Agbesi VK, Agu A, Orakwue C. Breast cancer diagnosis and management guided by data augmentation, utilizing an integrated framework of SHAP and random augmentation. Biofactors 2024; 50:114-134. [PMID: 37695269 DOI: 10.1002/biof.1995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 07/18/2023] [Indexed: 09/12/2023]
Abstract
Recent research indicates that early detection of breast cancer (BC) is critical in achieving favorable treatment outcomes and reducing the mortality rate associated with it. With the difficulty in obtaining a balanced dataset that is primarily sourced for the diagnosis of the disease, many researchers have relied on data augmentation techniques, thereby having varying datasets with varying quality and results. The dataset we focused on in this study is crafted from SHapley Additive exPlanations (SHAP)-augmentation and random augmentation (RA) approaches to dealing with imbalanced data. This was carried out on the Wisconsin BC dataset and the effectiveness of this approach to the diagnosis of BC was checked using six machine-learning algorithms. RA synthetically generated some parts of the dataset while SHAP helped in assessing the quality of the attributes, which were selected and used for the training of the models. The result from our analysis shows that the performance of the models used generally increased to more than 3% for most of the models using the dataset obtained by the integration of SHAP and RA. Additionally, after diagnosis, it is important to focus on providing quality care to ensure the best possible outcomes for patients. The need for proper management of the disease state is crucial so as to reduce the recurrence of the disease and other associated complications. Thus the interpretability provided by SHAP enlightens the management strategies in this study focusing on the quality of care given to the patient and how timely the care is.
Collapse
Affiliation(s)
- Chukwuebuka Joseph Ejiyi
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhen Qin
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Happy Monday
- Department of Computer Science, Oxford Brookes University and Chengdu University of Technology of China, Chengdu, China
| | | | - Chiagoziem Ukwuoma
- School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Thomas Ugochukwu Ejiyi
- Department of Pure and Industrial Chemistry, University of Nigeria Nsukka, Enugu, Nigeria
| | - Victor Kwaku Agbesi
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Amarachi Agu
- Department of Public Health, University of Nigeria Enugu Campus, Enugu, Nigeria
| | - Chiduzie Orakwue
- Department of Agricultural and Bio-Resources Engineering, College of Engineering Federal University of Agriculture Abeokuta, Nigeria
| |
Collapse
|
9
|
Almufareh MF, Tariq N, Humayun M, Almas B. A Federated Learning Approach to Breast Cancer Prediction in a Collaborative Learning Framework. Healthcare (Basel) 2023; 11:3185. [PMID: 38132075 PMCID: PMC10743267 DOI: 10.3390/healthcare11243185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 12/23/2023] Open
Abstract
Breast cancer continues to pose a substantial worldwide public health concern, necessitating the use of sophisticated diagnostic methods to enable timely identification and management. The present research utilizes an iterative methodology for collaborative learning, using Deep Neural Networks (DNN) to construct a breast cancer detection model with a high level of accuracy. By leveraging Federated Learning (FL), this collaborative framework effectively utilizes the combined knowledge and data assets of several healthcare organizations while ensuring the protection of patient privacy and data security. The model described in this study showcases significant progress in the field of breast cancer diagnoses, with a maximum accuracy rate of 97.54%, precision of 96.5%, and recall of 98.0%, by using an optimum feature selection technique. Data augmentation approaches play a crucial role in decreasing loss and improving model performance. Significantly, the F1-Score, a comprehensive metric for evaluating performance, turns out to be 97%. This study signifies a notable advancement in the field of breast cancer screening, fostering hope for improved patient outcomes via increased accuracy and reliability. This study highlights the potential impact of collaborative learning, namely, in the field of FL, in transforming breast cancer detection. The incorporation of privacy considerations and the use of diverse data sources contribute to the advancement of early detection and the treatment of breast cancer, hence yielding significant benefits for patients on a global scale.
Collapse
Affiliation(s)
- Maram Fahaad Almufareh
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Al Jouf 72311, Saudi Arabia;
| | - Noshina Tariq
- Department of Avionics Engineering, Air University, Islamabad 44000, Pakistan;
| | - Mamoona Humayun
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Al Jouf 72311, Saudi Arabia;
| | - Bushra Almas
- Institute of Information Technology, Quaid-i-Azam University, Islamabad 45320, Pakistan;
| |
Collapse
|
10
|
Ghavidel A, Pazos P. Machine learning (ML) techniques to predict breast cancer in imbalanced datasets: a systematic review. J Cancer Surviv 2023:10.1007/s11764-023-01465-3. [PMID: 37749361 DOI: 10.1007/s11764-023-01465-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/09/2023] [Indexed: 09/27/2023]
Abstract
Knowledge discovery in databases (KDD) is crucial in analyzing data to extract valuable insights. In medical outcome prediction, KDD is increasingly applied, particularly in diseases with high incidence, mortality, and costs, like cancer. ML techniques can develop more accurate predictive models for cancer patients' clinical outcomes, aiding informed healthcare decision-making. However, cancer prediction modeling faces challenges because of the unbalanced nature of the datasets, where there is a small minority category of patients with a cancer diagnosis compared to a majority category of cancer-free patients. Imbalanced datasets pose statistical hurdles like bias and overfitting when developing accurate prediction models. This systematic review focuses on breast cancer prediction articles published from 2008 to 2023. The objective is to examine ML methods used in three critical steps of KDD: preprocessing, data mining, and interpretation which address the imbalanced data problem in breast cancer prediction. This work synthesizes prior research in ML methods for breast cancer prediction. The findings help identify effective preprocessing strategies, including balancing and feature selection methods, robust predictive models, and evaluation metrics of those models. The study aims to inform healthcare providers and researchers about effective techniques for accurate breast cancer prediction.
Collapse
Affiliation(s)
- Arman Ghavidel
- Engineering Management and Systems Engineering, Old Dominion University, Norfolk, VA, USA
| | - Pilar Pazos
- Engineering Management and Systems Engineering, Old Dominion University, Norfolk, VA, USA.
| |
Collapse
|
11
|
Prusty S, Patnaik S, Dash SK. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. FRONTIERS IN NANOTECHNOLOGY 2022. [DOI: 10.3389/fnano.2022.972421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Cancer is the unregulated development of abnormal cells in the human body system. Cervical cancer, also known as cervix cancer, develops on the cervix’s surface. This causes an overabundance of cells to build up, eventually forming a lump or tumour. As a result, early detection is essential to determine what effective treatment we can take to overcome it. Therefore, the novel Machine Learning (ML) techniques come to a place that predicts cervical cancer before it becomes too serious. Furthermore, four common diagnosis testing namely, Hinselmann, Schiller, Cytology, and Biopsy have been compared and predicted with four common ML models, namely Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (K-NNs), and Extreme Gradient Boosting (XGB). Additionally, to enhance the better performance of ML models, the Stratified k-fold cross-validation (SKCV) method has been implemented over here. The findings of the experiments demonstrate that utilizing an RF classifier for analyzing the cervical cancer risk, could be a good alternative for assisting clinical specialists in classifying this disease in advance.
Collapse
|
12
|
De la Garza Ramos R, Yassari R. Letter to the Editor on "an Artificial Intelligence Approach to Predicting Unplanned Intubation Following Anterior Cervical Discectomy and Fusion" by Veeramani et al. Global Spine J 2022; 12:1304-1305. [PMID: 35350910 PMCID: PMC9210229 DOI: 10.1177/21925682221085545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Affiliation(s)
- Rafael De la Garza Ramos
- Spine Research Group, Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA,Department of Neurological Surgery, Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| | - Reza Yassari
- Spine Research Group, Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA,Department of Neurological Surgery, Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|