1
|
Goldman O, Ben-Assuli O, Ababa S, Rogowski O, Berliner S. Predicting metabolic syndrome: Machine learning techniques for improved preventive medicine. Health Informatics J 2025; 31:14604582251315602. [PMID: 39819060 DOI: 10.1177/14604582251315602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
Objectives: Metabolic syndrome (MetS) has a significant impact on health. MetS is the umbrella term for a group of interdependent metabolic threats that contribute to the emergence of diseases that can lead to death. This study was designed to better predict the risks associated with MetS to enable medical personnel to make more optimal preventive medical decisions. Study design: Data from a large hospital survey database was used to train data mining classification techniques to predict patient-level risk subsequent to extensive data engineering that included aggregating predictors from multiple visits. Methods: A prospective group of seemingly healthy volunteers from the database was studied based on data obtained during their regular annual health checkups. Results: After aggregating the variables over time, the findings indicated that the predictive power of our model outperformed methods presented in other studies (AUC = 0.947). Specific lifestyle factors were identified as contributing to MetS. Conclusion: Involvement to avoid recurring diseases can significantly decrease medical problems and treatment expenses. The findings emphasize the importance of using predictive tools in healthcare and preventive medicine. The results can be used for future prevention strategies that encourage lifestyle changes and implement directed medical treatment protocols to decrease the burden of illness.
Collapse
Affiliation(s)
- Orit Goldman
- Faculty of Business Administration, Ono Academic College, Kiryat Ono, Israel
| | - Ofir Ben-Assuli
- Faculty of Business Administration, Ono Academic College, Kiryat Ono, Israel
| | - Shimon Ababa
- Faculty of Business Administration, Ono Academic College, Kiryat Ono, Israel
| | - Ori Rogowski
- Departments of Internal Medicine "C", "D" and "E", Tel-Aviv Sourasky Medical Center, Sackler Faculty of Medicine, Tel-Aviv University, Tel. Aviv, Israel
| | - Shlomo Berliner
- Departments of Internal Medicine "C", "D" and "E", Tel-Aviv Sourasky Medical Center, Sackler Faculty of Medicine, Tel-Aviv University, Tel. Aviv, Israel
| |
Collapse
|
2
|
Kawakita T, Greenland P, Pemberton VL, Grobman WA, Silver RM, Bairey Merz CN, McNeil RB, Haas DM, Reddy UM, Simhan H, Saade GR. Prediction of metabolic syndrome following a first pregnancy. Am J Obstet Gynecol 2024; 231:649.e1-649.e19. [PMID: 38527600 PMCID: PMC11424779 DOI: 10.1016/j.ajog.2024.03.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/09/2024] [Accepted: 03/21/2024] [Indexed: 03/27/2024]
Abstract
BACKGROUND The prevalence of metabolic syndrome is rapidly increasing in the United States. We hypothesized that prediction models using data obtained during pregnancy can accurately predict the future development of metabolic syndrome. OBJECTIVE This study aimed to develop machine learning models to predict the development of metabolic syndrome using factors ascertained in nulliparous pregnant individuals. STUDY DESIGN This was a secondary analysis of a prospective cohort study (Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be Heart Health Study [nuMoM2b-HHS]). Data were collected from October 2010 to October 2020, and analyzed from July 2023 to October 2023. Participants had in-person visits 2 to 7 years after their first delivery. The primary outcome was metabolic syndrome, defined by the National Cholesterol Education Program Adult Treatment Panel III criteria, which was measured within 2 to 7 years after delivery. A total of 127 variables that were obtained during pregnancy were evaluated. The data set was randomly split into a training set (70%) and a test set (30%). We developed a random forest model and a lasso regression model using variables obtained during pregnancy. We compared the area under the receiver operating characteristic curve for both models. Using the model with the better area under the receiver operating characteristic curve, we developed models that included fewer variables based on SHAP (SHapley Additive exPlanations) values and compared them with the original model. The final model chosen would have fewer variables and noninferior areas under the receiver operating characteristic curve. RESULTS A total of 4225 individuals met the inclusion criteria; the mean (standard deviation) age was 27.0 (5.6) years. Of these, 754 (17.8%) developed metabolic syndrome. The area under the receiver operating characteristic curve of the random forest model was 0.878 (95% confidence interval, 0.846-0.909), which was higher than the 0.850 of the lasso model (95% confidence interval, 0.811-0.888; P<.001). Therefore, random forest models using fewer variables were developed. The random forest model with the top 3 variables (high-density lipoprotein, insulin, and high-sensitivity C-reactive protein) was chosen as the final model because it had the area under the receiver operating characteristic curve of 0.867 (95% confidence interval, 0.839-0.895), which was not inferior to the original model (P=.08). The area under the receiver operating characteristic curve of the final model in the test set was 0.847 (95% confidence interval, 0.821-0.873). An online application of the final model was developed (https://kawakita.shinyapps.io/metabolic/). CONCLUSION We developed a model that can accurately predict the development of metabolic syndrome in 2 to 7 years after delivery.
Collapse
Affiliation(s)
- Tetsuya Kawakita
- Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA.
| | - Philip Greenland
- Departments of Preventive Medicine and Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Victoria L Pemberton
- Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD
| | - William A Grobman
- Department of Obstetrics and Gynecology, The Ohio State University, Columbus, OH
| | - Robert M Silver
- Department of Obstetrics and Gynecology, University of Utah, Salt Lake City, UT
| | - C Noel Bairey Merz
- Barbra Streisand Women's Heart Center, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA
| | | | - David M Haas
- Department of Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, IN
| | - Uma M Reddy
- Department of Obstetrics and Gynecology, Columbia University, New York, NY
| | - Hyagriv Simhan
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh, Pittsburgh, PA
| | - George R Saade
- Department of Obstetrics and Gynecology, Eastern Virginia Medical School, Norfolk, VA
| |
Collapse
|
3
|
Mariam A, Javidi H, Zabor EC, Zhao R, Radivoyevitch T, Rotroff DM. Unsupervised clustering of longitudinal clinical measurements in electronic health records. PLOS DIGITAL HEALTH 2024; 3:e0000628. [PMID: 39405315 PMCID: PMC11478862 DOI: 10.1371/journal.pdig.0000628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 08/30/2024] [Indexed: 10/19/2024]
Abstract
Longitudinal electronic health records (EHR) can be utilized to identify patterns of disease development and progression in real-world settings. Unsupervised temporal matching algorithms are being repurposed to EHR from signal processing- and protein-sequence alignment tasks where they have shown immense promise for gaining insight into disease. The robustness of these algorithms for classifying EHR clinical data remains to be determined. Timeseries compiled from clinical measurements, such as blood pressure, have far more irregularity in sampling and missingness than the data for which these algorithms were developed, necessitating a systematic evaluation of these methods. We applied 30 state-of-the-art unsupervised machine learning algorithms to 6,912 systematically generated simulated clinical datasets across five parameters. These algorithms included eight temporal matching algorithms with fourteen partitional and eight fuzzy clustering methods. Nemenyi tests were used to determine differences in accuracy using the Adjusted Rand Index (ARI). Dynamic time warping and its lower-bound variants had the highest accuracies across all cohorts (median ARI>0.70). All 30 methods were better at discriminating classes with differences in magnitude compared to differences in trajectory shapes. Missingness impacted accuracies only when classes were different by trajectory shape. The method with the highest ARI was then used to cluster a large pediatric metabolic syndrome (MetS) cohort (N = 43,426). We identified three unique childhood BMI patterns with high average cluster consensus (>70%). The algorithm identified a cluster with consistently high BMI which had the greatest risk of MetS, consistent with prior literature (OR = 4.87, 95% CI: 3.93-6.12). While these algorithms have been shown to have similar accuracies for regular timeseries, their accuracies in clinical applications vary substantially in discriminating differences in shape and especially with moderate to high missingness (>10%). This systematic assessment also shows that the most robust algorithms tested here can derive meaningful insights from longitudinal clinical data.
Collapse
Affiliation(s)
- Arshiya Mariam
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
- Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Hamed Javidi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
- Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, Cleveland State University, Cleveland, Ohio, United States of America
| | - Emily C. Zabor
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
- Taussig Cancer Institute, Cleveland Clinic, Cleveland, Ohio United States of America
| | - Ran Zhao
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Tomas Radivoyevitch
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
- Center for Quantitative Metabolic Research, Cleveland Clinic, Cleveland, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, Cleveland State University, Cleveland, Ohio, United States of America
- Endocrinology and Metabolism Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| |
Collapse
|
4
|
Lee M, Park T, Shin JY, Park M. A comprehensive multi-task deep learning approach for predicting metabolic syndrome with genetic, nutritional, and clinical data. Sci Rep 2024; 14:17851. [PMID: 39090161 PMCID: PMC11294629 DOI: 10.1038/s41598-024-68541-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/24/2024] [Indexed: 08/04/2024] Open
Abstract
Metabolic syndrome (MetS) is a complex disorder characterized by a cluster of metabolic abnormalities, including abdominal obesity, hypertension, elevated triglycerides, reduced high-density lipoprotein cholesterol, and impaired glucose tolerance. It poses a significant public health concern, as individuals with MetS are at an increased risk of developing cardiovascular diseases and type 2 diabetes. Early and accurate identification of individuals at risk for MetS is essential. Various machine learning approaches have been employed to predict MetS, such as logistic regression, support vector machines, and several boosting techniques. However, these methods use MetS as a binary status and do not consider that MetS comprises five components. Therefore, a method that focuses on these characteristics of MetS is needed. In this study, we propose a multi-task deep learning model designed to predict MetS and its five components simultaneously. The benefit of multi-task learning is that it can manage multiple tasks with a single model, and learning related tasks may enhance the model's predictive performance. To assess the efficacy of our proposed method, we compared its performance with that of several single-task approaches, including logistic regression, support vector machine, CatBoost, LightGBM, XGBoost and one-dimensional convolutional neural network. For the construction of our multi-task deep learning model, we utilized data from the Korean Association Resource (KARE) project, which includes 352,228 single nucleotide polymorphisms (SNPs) from 7729 individuals. We also considered lifestyle, dietary, and socio-economic factors that affect chronic diseases, in addition to genomic data. By evaluating metrics such as accuracy, precision, F1-score, and the area under the receiver operating characteristic curve, we demonstrate that our multi-task learning model surpasses traditional single-task machine learning models in predicting MetS.
Collapse
Affiliation(s)
- Minhyuk Lee
- Department of Statistics, Korea University, Seoul, Republic of Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| | - Ji-Yeon Shin
- Department of Preventive Medicine, School of Medicine, Kyungpook National University, Daegu, Republic of Korea.
| | - Mira Park
- Department of Preventive Medicine, School of Medicine, Eulji University, Daejeon, Republic of Korea.
| |
Collapse
|
5
|
Boitor O, Stoica F, Mihăilă R, Stoica LF, Stef L. Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease. Diagnostics (Basel) 2023; 13:3631. [PMID: 38132215 PMCID: PMC10743072 DOI: 10.3390/diagnostics13243631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Metabolic syndrome is experiencing a concerning and escalating rise in prevalence today. The link between metabolic syndrome and periodontal disease is a highly relevant area of research. Some studies have suggested a bidirectional relationship between metabolic syndrome and periodontal disease, where one condition may exacerbate the other. Furthermore, the existence of periodontal disease among these individuals significantly impacts overall health management. This research focuses on the relationship between periodontal disease and metabolic syndrome, while also incorporating data on general health status and overall well-being. We aimed to develop advanced machine learning models that efficiently identify key predictors of metabolic syndrome, a significant emphasis being placed on thoroughly explaining the predictions generated by the models. We studied a group of 296 patients, hospitalized in SCJU Sibiu, aged between 45-79 years, of which 57% had metabolic syndrome. The patients underwent dental consultations and subsequently responded to a dedicated questionnaire, along with a standard EuroQol 5-Dimensions 5-Levels (EQ-5D-5L) questionnaire. The following data were recorded: DMFT (Decayed, Missing due to caries, and Filled Teeth), CPI (Community Periodontal Index), periodontal pockets depth, loss of epithelial insertion, bleeding after probing, frequency of tooth brushing, regular dental control, cardiovascular risk, carotid atherosclerosis, and EQ-5D-5L score. We used Automated Machine Learning (AutoML) frameworks to build predictive models in order to determine which of these risk factors exhibits the most robust association with metabolic syndrome. To gain confidence in the results provided by the machine learning models provided by the AutoML pipelines, we used SHapley Additive exPlanations (SHAP) values for the interpretability of these models, from a global and local perspective. The obtained results confirm that the severity of periodontal disease, high cardiovascular risk, and low EQ-5D-5L score have the greatest impact in the occurrence of metabolic syndrome.
Collapse
Affiliation(s)
- Ovidiu Boitor
- Dental Medicine Research Center, Faculty of Medicine, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Florin Stoica
- Department of Mathematics and Informatics, Research Center in Informatics and Information Technology, Faculty of Sciences, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Romeo Mihăilă
- Department of Internal Medicine, Faculty of Medicine, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Laura Florentina Stoica
- Department of Mathematics and Informatics, Research Center in Informatics and Information Technology, Faculty of Sciences, “Lucian Blaga” University, 550024 Sibiu, Romania;
| | - Laura Stef
- Department of Oral Health, Dental Medicine Research Center, Faculty of Medicine, “Lucian Blaga” University, 550024 Sibiu, Romania;
| |
Collapse
|
6
|
Kumar VS, Kumar PR, Yadalam PK, Anegundi RV, Shrivastava D, Alfurhud AA, Almaktoom IT, Alftaikhah SAA, Alsharari AHL, Srivastava KC. Machine learning in the detection of dental cyst, tumor, and abscess lesions. BMC Oral Health 2023; 23:833. [PMID: 37932703 PMCID: PMC10626702 DOI: 10.1186/s12903-023-03571-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 10/23/2023] [Indexed: 11/08/2023] Open
Abstract
BACKGROUND AND OBJECTIVE Dental panoramic radiographs are utilized in computer-aided image analysis, which detects abnormal tissue masses by analyzing the produced image capacity to recognize patterns of intensity fluctuations. This is done to reduce the need for invasive biopsies for arriving to a diagnosis. The aim of the current study was to examine and compare the accuracy of several texture analysis techniques, such as Grey Level Run Length Matrix (GLRLM), Grey Level Co-occurrence Matrix (GLCM), and wavelet analysis in recognizing dental cyst, tumor, and abscess lesions. MATERIALS & METHODS The current retrospective study retrieved a total of 172 dental panoramic radiographs with lesion including dental cysts, tumors, or abscess. Radiographs that failed to meet technical criteria for diagnostic quality (such as significant overlap of teeth, a diffuse image, or distortion) were excluded from the sample. The methodology adopted in the study comprised of five stages. At first, the radiographs are improved, and the area of interest was segmented manually. A variety of feature extraction techniques, such GLCM, GLRLM, and the wavelet analysis were used to gather information from the area of interest. Later, the lesions were classified as a cyst, tumor, abscess, or using a support vector machine (SVM) classifier. Eventually, the data was transferred into a Microsoft Excel spreadsheet and statistical package for social sciences (SPSS) (version 21) was used to conduct the statistical analysis. Initially descriptive statistics were computed. For inferential analysis, statistical significance was determined by a p value < 0.05. The sensitivity, specificity, and accuracy were used to find the significant difference between assessed and actual diagnosis. RESULTS The findings demonstrate that 98% accuracy was achieved using GLCM, 91% accuracy using Wavelet analysis & 95% accuracy using GLRLM in distinguishing between dental cyst, tumor, and abscess lesions. The area under curve (AUC) number indicates that GLCM achieves a high degree of accuracy. The results achieved excellent accuracy (98%) using GLCM. CONCLUSION The GLCM features can be used for further research. After improving the performance and training, it can support routine histological diagnosis and can assist the clinicians in arriving at accurate and spontaneous treatment plans.
Collapse
Affiliation(s)
- Vyshiali Sivaram Kumar
- Department of Public Health Dentistry, Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - Pradeep R Kumar
- Department of Public Health Dentistry, Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - Pradeep Kumar Yadalam
- Department of Periodontics, Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India.
| | - Raghavendra Vamsi Anegundi
- Department of Periodontics, Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, India
| | - Deepti Shrivastava
- Department Department of Preventive Dentistry, College of Dentistry, Jouf University, 72345, Sakaka, Saudi Arabia.
| | - Ahmed Ata Alfurhud
- Oral Surgery Department, Institute of Dentistry, Queen Mary University of London, London, E1 2AD, UK
- College of Dentistry, Jouf University, 72345, Sakaka, Saudi Arabia
| | | | | | | | - Kumar Chandan Srivastava
- Department of Oral & Maxillofacial Surgery & Diagnostic Sciences, College of Dentistry, Jouf University, 72345, Sakaka, Saudi Arabia.
- Department of Oral Medicine and Radiology, Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamil Nadu, 602105, India.
| |
Collapse
|