1
|
Zamanian H, Shalbaf A. Estimation of non-alcoholic steatohepatitis (NASH) disease using clinical information based on the optimal combination of intelligent algorithms for feature selection and classification. Comput Methods Biomech Biomed Engin 2024; 27:964-979. [PMID: 37254745 DOI: 10.1080/10255842.2023.2217978] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 05/12/2023] [Indexed: 06/01/2023]
Abstract
The early diagnosis of NASH disease can decrease the risk of proceeding elements and treatment costs for patients. This study aims to present an optimal combination of intelligent algorithms using advanced machine learning methods, including different feature selections and classifications based on clinical data and blood factors. In this work, collected data were from 176 patients to investigate NASH disease, and 19 features were extracted. We then sought to find the best combination of features based on different feature selection algorithms such as Feature Forward Selection (FFS), Minimum Redundancy Maximum Relevance (MRMR), and Mutual Information (MI). Finally, we used nine classifier frameworks with different mathematical mechanisms, including random forest (RF), logistic regression (LR), Linear Discriminant Analysis (LDA), AdaBoost, K nearest neighbors (KNN), multilayer perceptron model (MLP), support vector machine (SVM), and decision tree (DT) to estimate NASH disease. Our investigation revealed that the combination of dominant features, namely body mass index (BMI), glutamic pyruvic transaminase (GPT), total cholesterol (TC), high-density lipoprotein (HDL), Ezetimibe, lipoprotein level Lp(a), Loge(Lp(a)), total triglyceride (TG), Creatinine (Cre), HbA1c, Fibrate, and Sex, selected by the MRMR algorithm and classified by the RF method can provide the most appropriate performance based on less computation effort and maximum performance with accuracy, AUC, precision, and recall indices, which are 81.51 ± 9.35 , 82.53 ± 11.24 , 85.28 ± 9.68 , and 89.49 ± 7.92 , respectively. This study investigated the configuration of feature selection and classifier that is most suitable for classifying NASH disease based on clinical data and blood factors. The proposed intelligent algorithm based on MRMR and RF classifier can automatically diagnose NASH disease with appropriate performance and present an initial report without any further invasive methods. It also clarifies the diagnostic process and, as a result, the continuation of their prevention and treatment cycle.
Collapse
Affiliation(s)
- Hamed Zamanian
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ahmad Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
Njei B, Osta E, Njei N, Al-Ajlouni YA, Lim JK. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci Rep 2024; 14:8589. [PMID: 38615137 PMCID: PMC11016071 DOI: 10.1038/s41598-024-59183-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/08/2024] [Indexed: 04/15/2024] Open
Abstract
Early identification of high-risk metabolic dysfunction-associated steatohepatitis (MASH) can offer patients access to novel therapeutic options and potentially decrease the risk of progression to cirrhosis. This study aimed to develop an explainable machine learning model for high-risk MASH prediction and compare its performance with well-established biomarkers. Data were derived from the National Health and Nutrition Examination Surveys (NHANES) 2017-March 2020, which included a total of 5281 adults with valid elastography measurements. We used a FAST score ≥ 0.35, calculated using liver stiffness measurement and controlled attenuation parameter values and aspartate aminotransferase levels, to identify individuals with high-risk MASH. We developed an ensemble-based machine learning XGBoost model to detect high-risk MASH and explored the model's interpretability using an explainable artificial intelligence SHAP method. The prevalence of high-risk MASH was 6.9%. Our XGBoost model achieved a high level of sensitivity (0.82), specificity (0.91), accuracy (0.90), and AUC (0.95) for identifying high-risk MASH. Our model demonstrated a superior ability to predict high-risk MASH vs. FIB-4, APRI, BARD, and MASLD fibrosis scores (AUC of 0.95 vs. 0.50, 0.50, 0.49 and 0.50, respectively). To explain the high performance of our model, we found that the top 5 predictors of high-risk MASH were ALT, GGT, platelet count, waist circumference, and age. We used an explainable ML approach to develop a clinically applicable model that outperforms commonly used clinical risk indices and could increase the identification of high-risk MASH patients in resource-limited settings.
Collapse
Affiliation(s)
- Basile Njei
- Section of Digestive Diseases, Yale School of Medicine, New Haven, CT, 06510, USA
- Global Clinical Scholars Research Program, Harvard Medical School, Boston, MA, USA
- Artificial Intelligence Programme, University of Oxford Said Business School, Oxford, UK
| | - Eri Osta
- University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Nelvis Njei
- Centers for Medicare and Medicaid Services, Baltimore, MD, USA
| | | | - Joseph K Lim
- Section of Digestive Diseases, Yale School of Medicine, New Haven, CT, 06510, USA.
| |
Collapse
|
3
|
Zamanian H, Shalbaf A, Zali MR, Khalaj AR, Dehghan P, Tabesh M, Hatami B, Alizadehsani R, Tan RS, Acharya UR. Application of artificial intelligence techniques for non-alcoholic fatty liver disease diagnosis: A systematic review (2005-2023). COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107932. [PMID: 38008040 DOI: 10.1016/j.cmpb.2023.107932] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/13/2023] [Accepted: 11/15/2023] [Indexed: 11/28/2023]
Abstract
BACKGROUND AND OBJECTIVES Non-alcoholic fatty liver disease (NAFLD) is a common liver disease with a rapidly growing incidence worldwide. For prognostication and therapeutic decisions, it is important to distinguish the pathological stages of NAFLD: steatosis, steatohepatitis, and liver fibrosis, which are definitively diagnosed on invasive biopsy. Non-invasive ultrasound (US) imaging, including US elastography technique, and clinical parameters can be used to diagnose and grade NAFLD and its complications. Artificial intelligence (AI) is increasingly being harnessed for developing NAFLD diagnostic models based on clinical, biomarker, or imaging data. In this work, we systemically reviewed the literature for AI-enabled NAFLD diagnostic models based on US (including elastography) and clinical (including serological) data. METHODS We performed a comprehensive search on Google Scholar, Scopus, and PubMed search engines for articles published between January 2005 and June 2023 related to AI models for NAFLD diagnosis based on US and/or clinical parameters using the following search terms: "non-alcoholic fatty liver disease", "non-alcoholic steatohepatitis", "deep learning", "machine learning", "artificial intelligence", "ultrasound imaging", "sonography", "clinical information". RESULTS We reviewed 64 published models that used either US (including elastography) or clinical data input to detect the presence of NAFLD, non-alcoholic steatohepatitis, and/or fibrosis, and in some cases, the severity of steatosis, inflammation, and/or fibrosis as well. The performances of the published models were summarized, and stratified by data input and algorithms used, which could be broadly divided into machine and deep learning approaches. CONCLUSION AI models based on US imaging and clinical data can reliably detect NAFLD and its complications, thereby reducing diagnostic costs and the need for invasive liver biopsy. The models offer advantages of efficiency, accuracy, and accessibility, and serve as virtual assistants for specialists to accelerate disease diagnosis and reduce treatment costs for patients and healthcare systems.
Collapse
Affiliation(s)
- H Zamanian
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - A Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - M R Zali
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - A R Khalaj
- Tehran obesity treatment center, Department of Surgery, Faculty of Medicine, Shahed University, Tehran, Iran
| | - P Dehghan
- Department of Radiology, Imaging Department, Taleghani Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - M Tabesh
- Cardiac Primary Prevention Research Center, Cardiovascular Diseases Research, Tehran University of Medical Sciences, Tehran, Iran
| | - B Hatami
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - R Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, VIC, Australia
| | - Ru-San Tan
- National Heart Centre Singapore, Singapore 169609, Singapore; Duke-NUS Medical School, Singapore
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, Australia; Centre for Health Research, University of Southern Queensland, Australia
| |
Collapse
|
4
|
Naderi Yaghouti AR, Zamanian H, Shalbaf A. Machine learning approaches for early detection of non-alcoholic steatohepatitis based on clinical and blood parameters. Sci Rep 2024; 14:2442. [PMID: 38287043 PMCID: PMC10824722 DOI: 10.1038/s41598-024-51741-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 01/09/2024] [Indexed: 01/31/2024] Open
Abstract
This study aims to develop a machine learning approach leveraging clinical data and blood parameters to predict non-alcoholic steatohepatitis (NASH) based on the NAFLD Activity Score (NAS). Using a dataset of 181 patients, we performed preprocessing including normalization and categorical encoding. To identify predictive features, we applied sequential forward selection (SFS), chi-square, analysis of variance (ANOVA), and mutual information (MI). The selected features were used to train machine learning classifiers including SVM, random forest, AdaBoost, LightGBM, and XGBoost. Hyperparameter tuning was done for each classifier using randomized search. Model evaluation was performed using leave-one-out cross-validation over 100 repetitions. Among the classifiers, random forest, combined with SFS feature selection and 10 features, obtained the best performance: Accuracy: 81.32% ± 6.43%, Sensitivity: 86.04% ± 6.21%, Specificity: 70.49% ± 8.12% Precision: 81.59% ± 6.23%, and F1-score: 83.75% ± 6.23% percent. Our findings highlight the promise of machine learning in enhancing early diagnosis of NASH and provide a compelling alternative to conventional diagnostic techniques. Consequently, this study highlights the promise of machine learning techniques in enhancing early and non-invasive diagnosis of NASH based on readily available clinical and blood data. Our findings provide the basis for developing scalable approaches that can improve screening and monitoring of NASH progression.
Collapse
Affiliation(s)
- Amir Reza Naderi Yaghouti
- Department of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Hamed Zamanian
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ahmad Shalbaf
- Department of Biomedical Engineering and Medical Physics, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Thapa R, Garikipati A, Ciobanu M, Singh NP, Browning E, DeCurzio J, Barnes G, Dinenno FA, Mao Q, Das R. Machine Learning Differentiation of Autism Spectrum Sub-Classifications. J Autism Dev Disord 2023:10.1007/s10803-023-06121-4. [PMID: 37751097 DOI: 10.1007/s10803-023-06121-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2023] [Indexed: 09/27/2023]
Abstract
PURPOSE Disorders on the autism spectrum have characteristics that can manifest as difficulties with communication, executive functioning, daily living, and more. These challenges can be mitigated with early identification. However, diagnostic criteria has changed from DSM-IV to DSM-5, which can make diagnosing a disorder on the autism spectrum complex. We evaluated machine learning to classify individuals as having one of three disorders of the autism spectrum under DSM-IV, or as non-spectrum. METHODS We employed machine learning to analyze retrospective data from 38,560 individuals. Inputs encompassed clinical, demographic, and assessment data. RESULTS The algorithm achieved AUROCs ranging from 0.863 to 0.980. The model correctly classified 80.5% individuals; 12.6% of individuals from this dataset were misclassified with another disorder on the autism spectrum. CONCLUSION Machine learning can classify individuals as having a disorder on the autism spectrum or as non-spectrum using minimal data inputs.
Collapse
Affiliation(s)
- R Thapa
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - A Garikipati
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - M Ciobanu
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - N P Singh
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - E Browning
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - J DeCurzio
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - G Barnes
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - F A Dinenno
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| | - Q Mao
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA.
| | - R Das
- Montera, Inc dba Forta, 548 Market St, PMB 89605, San Francisco, CA, USA
| |
Collapse
|
6
|
Maharjan J, Garikipati A, Dinenno FA, Ciobanu M, Barnes G, Browning E, DeCurzio J, Mao Q, Das R. Machine learning determination of applied behavioral analysis treatment plan type. Brain Inform 2023; 10:7. [PMID: 36862316 PMCID: PMC9981822 DOI: 10.1186/s40708-023-00186-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 02/06/2023] [Indexed: 03/03/2023] Open
Abstract
BACKGROUND Applied behavioral analysis (ABA) is regarded as the gold standard treatment for autism spectrum disorder (ASD) and has the potential to improve outcomes for patients with ASD. It can be delivered at different intensities, which are classified as comprehensive or focused treatment approaches. Comprehensive ABA targets multiple developmental domains and involves 20-40 h/week of treatment. Focused ABA targets individual behaviors and typically involves 10-20 h/week of treatment. Determining the appropriate treatment intensity involves patient assessment by trained therapists, however, the final determination is highly subjective and lacks a standardized approach. In our study, we examined the ability of a machine learning (ML) prediction model to classify which treatment intensity would be most suited individually for patients with ASD who are undergoing ABA treatment. METHODS Retrospective data from 359 patients diagnosed with ASD were analyzed and included in the training and testing of an ML model for predicting comprehensive or focused treatment for individuals undergoing ABA treatment. Data inputs included demographics, schooling, behavior, skills, and patient goals. A gradient-boosted tree ensemble method, XGBoost, was used to develop the prediction model, which was then compared against a standard of care comparator encompassing features specified by the Behavior Analyst Certification Board treatment guidelines. Prediction model performance was assessed via area under the receiver-operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). RESULTS The prediction model achieved excellent performance for classifying patients in the comprehensive versus focused treatment groups (AUROC: 0.895; 95% CI 0.811-0.962) and outperformed the standard of care comparator (AUROC 0.767; 95% CI 0.629-0.891). The prediction model also achieved sensitivity of 0.789, specificity of 0.808, PPV of 0.6, and NPV of 0.913. Out of 71 patients whose data were employed to test the prediction model, only 14 misclassifications occurred. A majority of misclassifications (n = 10) indicated comprehensive ABA treatment for patients that had focused ABA treatment as the ground truth, therefore still providing a therapeutic benefit. The three most important features contributing to the model's predictions were bathing ability, age, and hours per week of past ABA treatment. CONCLUSION This research demonstrates that the ML prediction model performs well to classify appropriate ABA treatment plan intensity using readily available patient data. This may aid with standardizing the process for determining appropriate ABA treatments, which can facilitate initiation of the most appropriate treatment intensity for patients with ASD and improve resource allocation.
Collapse
Affiliation(s)
- Jenish Maharjan
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Anurag Garikipati
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Frank A. Dinenno
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Madalina Ciobanu
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Gina Barnes
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Ella Browning
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Jenna DeCurzio
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| | - Qingqing Mao
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA, PMB 89605, USA.
| | - Ritankar Das
- Montera Inc. dba Forta, 548 Market St, San Francisco, CA PMB 89605 USA
| |
Collapse
|
7
|
Machine-Learning Algorithm for Predicting Fatty Liver Disease in a Taiwanese Population. J Pers Med 2022; 12:jpm12071026. [PMID: 35887527 PMCID: PMC9317783 DOI: 10.3390/jpm12071026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 06/18/2022] [Accepted: 06/22/2022] [Indexed: 12/04/2022] Open
Abstract
The rising incidence of fatty liver disease (FLD) poses a health challenge, and is expected to be the leading global cause of liver-related morbidity and mortality in the near future. Early case identification is crucial for disease intervention. A retrospective cross-sectional study was performed on 31,930 Taiwanese subjects (25,544 training and 6386 testing sets) who had received health check-ups and abdominal ultrasounds in Changhua Christian Hospital from January 2009 to January 2019. Clinical and laboratory factors were included for analysis by different machine-learning algorithms. In addition, the performance of the machine-learning algorithms was compared with that of the fatty liver index (FLI). Totally, 6658/25,544 (26.1%) and 1647/6386 (25.8%) subjects had moderate-to-severe liver disease in the training and testing sets, respectively. Five machine-learning models were examined and demonstrated exemplary performance in predicting FLD. Among these models, the xgBoost model revealed the highest area under the receiver operating characteristic (AUROC) (0.882), accuracy (0.833), F1 score (0.829), sensitivity (0.833), and specificity (0.683) compared with those of neural network, logistic regression, random forest, and support vector machine-learning models. The xgBoost, neural network, and logistic regression models had a significantly higher AUROC than that of FLI. Body mass index was the most important feature to predict FLD according to the feature ranking scores. The xgBoost model had the best overall prediction ability for diagnosing FLD in our study. Machine-learning algorithms provide considerable benefits for screening candidates with FLD.
Collapse
|