1
|
Sezer S, Oter A, Ersoz B, Topcuoglu C, İbrahim Bulbul H, Sagiroglu S, Akin M, Yilmaz G. Explainable artificial intelligence for LDL cholesterol prediction and classification. Clin Biochem 2024; 130:110791. [PMID: 38977210 DOI: 10.1016/j.clinbiochem.2024.110791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 07/04/2024] [Accepted: 07/05/2024] [Indexed: 07/10/2024]
Abstract
INTRODUCTION Monitoring LDL-C levels is essential in clinical practice because there is a direct relation between low-density lipoprotein cholesterol (LDL-C) levels and atherosclerotic heart disease risk. Therefore, measurement or estimate of LDL-C is critical. The present study aims to evaluate Artificial Intelligence (AI) and Explainable AI (XAI) methodologies in predicting LDL-C levels while emphasizing the interpretability of these predictions. MATERIALS AND METHODS We retrospectively reviewed data from the Laboratory Information System (LIS) of Ankara Etlik City Hospital (AECH). We included 60.217 patients with standard lipid profiles (total cholesterol [TC], high-density lipoprotein cholesterol, and triglycerides) paired with same-day direct LDL-C results. AI methodologies, such as Gradient Boosting (GB), Random Forests (RF), Support Vector Machines (SVM), and Decision Trees (DT), were used to predict LDL-C and compared directly measured and calculated LDL-C with formulas. XAI techniques such as Shapley additive annotation (SHAP) and locally interpretable model-agnostic explanation (LIME) were used to interpret AI models and improve their explainability. RESULTS Predicted LDL-C values using AI, especially RF or GB, showed a stronger correlation with direct measurement LDL-C values than calculated LDL-C values with formulas. TC was shown to be the most influential factor in LDL-C prediction using SHAP and LIME. The agreement between the treatment groups based on NCEP ATPIII guidelines according to measured LDL-C and the LDL-C groups obtained with AI was higher than that obtained with formulas. CONCLUSIONS It can be concluded that AI is not only a reliable method but also an explainable method for LDL-C estimation and classification.
Collapse
Affiliation(s)
- Sevilay Sezer
- Department of Medical Biochemistry, Ministry of Health, Ankara Bilkent City Hospital, Ankara, Turkey.
| | - Ali Oter
- Department of Electronic and Automation, Kahramanmaraş Sütçü Imam University, Kahramanmaraş, Turkey
| | - Betul Ersoz
- Artificial Intelligence and Big Data Analytics Security R&D Center, Gazi University, Ankara, Turkey
| | - Canan Topcuoglu
- Department of Medical Biochemistry, Ministry of Health, Ankara Etlik City Hospital, Ankara, Turkey
| | - Halil İbrahim Bulbul
- Department of Computer and Instructional Technologies Education, Gazi University, Ankara, Turkey
| | - Seref Sagiroglu
- Artificial Intelligence and Big Data Analytics Security R&D Center, Gazi University, Ankara, Turkey
| | - Murat Akin
- Artificial Intelligence and Big Data Analytics Security R&D Center, Gazi University, Ankara, Turkey
| | - Gulsen Yilmaz
- Department of Medical Biochemistry, Ministry of Health, Ankara Bilkent City Hospital, Ankara, Turkey; Department of Medical Biochemistry, Ankara Yıldırım Beyazıt University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
2
|
Kim Y, Lee WK, Lee W. Prediction of low-density lipoprotein cholesterol levels using machine learning methods. Lab Med 2024; 55:471-484. [PMID: 38217551 DOI: 10.1093/labmed/lmad114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2024] Open
Abstract
OBJECTIVE Low-density lipoprotein cholesterol (LDL-C) has been commonly calculated by equations, but their performance has not been entirely satisfactory. This study aimed to develop a more accurate LDL-C prediction model using machine learning methods. METHODS The study involved predicting directly measured LDL-C, using individual characteristics, lipid profiles, and other laboratory results as predictors. The models applied to predict LDL-C values were multiple regression, penalized regression, random forest, and XGBoost. Additionally, a novel 2-step prediction model was developed and introduced. The machine learning methods were evaluated against the Friedewald, Martin, and Sampson equations. RESULTS The Friedewald, Martin, and Sampson equations had root mean squared error (RMSE) values of 12.112, 8.084, and 8.492, respectively, whereas the 2-step prediction model showed the highest accuracy, with an RMSE of 7.015. The LDL-C levels were also classified as a categorical variable according to the diagnostic criteria of the dyslipidemia treatment guideline, and concordance rates were calculated between the predictive values obtained from each method and the directly measured ones. The 2-step prediction model had the highest concordance rate (85.1%). CONCLUSION The machine learning method can calculate LDL-C more accurately than existing equations. The proposed 2-step prediction model, in particular, outperformed the other machine learning methods.
Collapse
Affiliation(s)
- Yoori Kim
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Won Kyung Lee
- Department of Prevention and Management, Inha University Hospital, School of Medicine, Inha University, Incheon, Republic of Korea
| | - Woojoo Lee
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
3
|
Çubukçu HC, Topcu Dİ, Yenice S. Machine learning-based clinical decision support using laboratory data. Clin Chem Lab Med 2024; 62:793-823. [PMID: 38015744 DOI: 10.1515/cclm-2023-1037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/17/2023] [Indexed: 11/30/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) are becoming vital in laboratory medicine and the broader context of healthcare. In this review article, we summarized the development of ML models and how they contribute to clinical laboratory workflow and improve patient outcomes. The process of ML model development involves data collection, data cleansing, feature engineering, model development, and optimization. These models, once finalized, are subjected to thorough performance assessments and validations. Recently, due to the complexity inherent in model development, automated ML tools were also introduced to streamline the process, enabling non-experts to create models. Clinical Decision Support Systems (CDSS) use ML techniques on large datasets to aid healthcare professionals in test result interpretation. They are revolutionizing laboratory medicine, enabling labs to work more efficiently with less human supervision across pre-analytical, analytical, and post-analytical phases. Despite contributions of the ML tools at all analytical phases, their integration presents challenges like potential model uncertainties, black-box algorithms, and deskilling of professionals. Additionally, acquiring diverse datasets is hard, and models' complexity can limit clinical use. In conclusion, ML-based CDSS in healthcare can greatly enhance clinical decision-making. However, successful adoption demands collaboration among professionals and stakeholders, utilizing hybrid intelligence, external validation, and performance assessments.
Collapse
Affiliation(s)
- Hikmet Can Çubukçu
- General Directorate of Health Services, Rare Diseases Department, Turkish Ministry of Health, Ankara, Türkiye
- Hacettepe University Institute of Informatics, Ankara, Türkiye
| | - Deniz İlhan Topcu
- Health Sciences University İzmir Tepecik Education and Research Hospital, Medical Biochemistry, İzmir, Türkiye
| | - Sedef Yenice
- Florence Nightingale Hospital, Istanbul, Türkiye
| |
Collapse
|
4
|
Paydaş Hataysal E, Körez MK, Yeşildal F, İşman FK. A comparative evaluation of low-density lipoprotein cholesterol estimation: Machine learning algorithms versus various equations. Clin Chim Acta 2024; 557:117853. [PMID: 38461864 DOI: 10.1016/j.cca.2024.117853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/10/2024] [Accepted: 03/01/2024] [Indexed: 03/12/2024]
Abstract
BACKGROUND Given the critical importance of Low-density lipoprotein cholesterol (LDL-C) levels in determining cardiovascular risk, it is essential to measure LDL-C accurately. Since the Friedewald formula generates incorrect predictions in many circumstances, new equations have been developed to overcome the Friedewald equations' shortcomings. This study aimed to compare estimated LDL-C with directly measured LDL-C (dLDL-C), as well as their performance in predicting LDL-C, utilizing Friedewald, extended Martin-Hopkins, Sampson, de Cordova, and Vujovic formulas and five machine learning (ML) algorithms. METHODS A total of 29,504 samples from the ISLAB-2 Core Laboratory were included in the study. All statistical analysis was performed using R version 4.1.2. Statistical Language. RESULTS Bayesian-Regularized Neural Network (BRNN) (r = 0.957) and Random Forest (RF) (r = 0.957) algorithms showed a higher correlation with dLDL-C than the other equations in all-testing dataset. All ML algorithms demonstrated less bias than pre-existing LDL-C equations with dLDL-C and outperformed the LDL-C estimation equations in terms of concordance in all-testing dataset. CONCLUSIONS The results of our research indicate that when compared to conventional equations, ML algorithms are much more effective in predicting LDL-C. ML algorithms, aided by a vast dataset, could have the capability to predict LDL-C levels even in cases where triglyceride levels are high, unlike the limited usage of Friedewald formula.
Collapse
Affiliation(s)
- Esra Paydaş Hataysal
- Department of Biochemistry, Göztepe Prof. Dr. Süleyman Yalçın City Hospital, Istanbul, Turkey.
| | - Muslu Kazım Körez
- Department of Biostatistics, Selcuk University Faculty of Medicine, Konya, Turkey
| | - Fatih Yeşildal
- Department of Biochemistry, Haydarpaşa Numune Training and Research Hospital, Istanbul, Turkey
| | - Ferruh Kemal İşman
- Department of Biochemistry, Göztepe Prof. Dr. Süleyman Yalçın City Hospital, Istanbul, Turkey
| |
Collapse
|
5
|
Martins J, Steyn N, Rossouw HM, Pillay TS. Best practice for LDL-cholesterol: when and how to calculate. J Clin Pathol 2023; 76:145-152. [PMID: 36650044 DOI: 10.1136/jcp-2022-208480] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/23/2022] [Indexed: 01/19/2023]
Abstract
The lipid profile is important in the risk assessment for cardiovascular disease. The lipid profile includes total cholesterol, high-density lipoprotein (HDL)-cholesterol, triglycerides (TGs) and low-density lipoprotein (LDL)-cholesterol (LDL-C). LDL-C has traditionally been calculated using the Friedewald equation (invalid with TGs greater than 4.5 mmol/L and is based on the assumption that the ratio of TG to cholesterol in very- low-density lipoprotein (VLDL) is 5 when measured in mg /dL). LDL-C can be quantified with a reference method, beta-quantification involving ultracentrifugation and this is unsuitable for routine use. Direct measurement of LDL-C was expected to provide a solution with high TGs. However, this has some challenges because of a lack of standardisation between the reagents and assays from different manufacturers as well as the additional costs. Furthermore, mild hypertriglyceridaemia also distorts direct LDL-C measurements. With the limitations of the Friedewald equation, alternatives have been derived. Newer equations include the Sampson-National Institutes of Health (NIH) equation 2 and the Martin-Hopkins equation. The Sampson-NIH2 equation was derived using beta-quantification in a population with high TG and multiple least squares regression to calculate VLDL-C, using TGs and non-HDL-C as independent variables. These data were used in a second equation to calculate LDL-C. The Sampson-NIH2 equation can be used with TGs up to 9 mmol/L. The Martin-Hopkins equation uses a 180 cell stratification of TG/non-HDL-C to determine the TG:VLDL-C ratio and can be used with TGs up to 4.5 mmol/L. Recently, an extended Martin-Hopkins equation has become available for TGs up to 9.04 mmol/L.This article discusses the best practice approach to calculating LDL-C based on the available evidence.
Collapse
Affiliation(s)
- Janine Martins
- Chemical Pathology, University of Pretoria, Pretoria, South Africa
| | - Nicolene Steyn
- Chemical Pathology, University of Pretoria, Pretoria, South Africa
| | - H Muller Rossouw
- Chemical Pathology, University of Pretoria, Pretoria, South Africa
| | - Tahir S Pillay
- Chemical Pathology, University of Pretoria, Pretoria, South Africa .,Chemical Pathology, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
6
|
Abstract
PURPOSE OF REVIEW The reference method for low-density lipoprotein-cholesterol (LDL-C) quantitation is β-quantification, a technically demanding method that is not convenient for routine use. Indirect calculation methods to estimate LDL-C, including the Friedewald equation, have been used since 1972. This calculation has several recognized limitations, especially inaccurate results for triglycerides (TG) >4.5 mmol/l (>400 mg/dl). In view of this, several other equations were developed across the world in different datasets.The purpose of this review was to analyze the best method to calculate LDL-C in clinical practice by reviewing studies that compared equations with measured LDL-C. RECENT FINDINGS We identified 45 studies that compared these formulae. The Martin/Hopkins equation uses an adjustable factor for TG:very low-density lipoprotein-cholesterol ratios, validated in a large dataset and demonstrated to provide more accurate LDL-C calculation, especially when LDL <1.81 mmol/l (<70 mg/dl) and with elevated TG. However, it is not in widespread international use because of the need for further validation and the use of the adjustable factor. The Sampson equation was developed for patients with TG up to 9 mmol/l (800 mg/dl) and was based on β-quantification and performs well on high TG, postprandial and low LDL-C samples similar to direct LDL-C. SUMMARY The choice of equation should take into the level of triglycerides. Further validation of different equations is required in different populations.
Collapse
Affiliation(s)
- Janine Martins
- Department of Chemical Pathology, Faculty of Health Sciences, University of Pretoria and National Health Laboratory Service Tshwane Academic Division
- Department of Public Health Medicine, School of Health System & Public Health, University of Pretoria, Pretoria, South Africa
| | - H Muller Rossouw
- Department of Chemical Pathology, Faculty of Health Sciences, University of Pretoria and National Health Laboratory Service Tshwane Academic Division
| | - Tahir S Pillay
- Department of Chemical Pathology, Faculty of Health Sciences, University of Pretoria and National Health Laboratory Service Tshwane Academic Division
| |
Collapse
|