Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. J Big Data 2021;8:140. [PMID: 34722113 PMCID: PMC8549433 DOI: 10.1186/s40537-021-00516-9] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 09/12/2021] [Indexed: 05/04/2023]

For:	Emmanuel T, Maupong T, Mpoeleng D, Semong T, Mphago B, Tabona O. A survey on missing data in machine learning. J Big Data 2021;8:140. [PMID: 34722113 PMCID: PMC8549433 DOI: 10.1186/s40537-021-00516-9] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 09/12/2021] [Indexed: 05/04/2023]

Number

Cited by Other Article(s)

Mpouzika M, Karanikola M, Blot S. The conundrum of predicting neurological outcomes in non-traumatic coma patients: True prediction or "Flipping a Coin"? Intensive Crit Care Nurs 2024;83:103707. [PMID: 38636295 DOI: 10.1016/j.iccn.2024.103707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2024]

Bellmann L, Wiederhold AJ, Trübe L, Twerenbold R, Ückert F, Gottfried K. Introducing Attribute Association Graphs to Facilitate Medical Data Exploration: Development and Evaluation Using Epidemiological Study Data. JMIR Med Inform 2024;12:e49865. [PMID: 39046780 DOI: 10.2196/49865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/11/2023] [Accepted: 05/04/2024] [Indexed: 07/25/2024] Open

Abstract

BACKGROUND

Interpretability and intuitive visualization facilitate medical knowledge generation through big data. In addition, robustness to high-dimensional and missing data is a requirement for statistical approaches in the medical domain. A method tailored to the needs of physicians must meet all the abovementioned criteria.

OBJECTIVE

This study aims to develop an accessible tool for visual data exploration without the need for programming knowledge, adjusting complex parameterizations, or handling missing data. We sought to use statistical analysis using the setting of disease and control cohorts familiar to clinical researchers. We aimed to guide the user by identifying and highlighting data patterns associated with disease and reveal relations between attributes within the data set.

METHODS

We introduce the attribute association graph, a novel graph structure designed for visual data exploration using robust statistical metrics. The nodes capture frequencies of participant attributes in disease and control cohorts as well as deviations between groups. The edges represent conditional relations between attributes. The graph is visualized using the Neo4j (Neo4j, Inc) data platform and can be interactively explored without the need for technical knowledge. Nodes with high deviations between cohorts and edges of noticeable conditional relationship are highlighted to guide the user during the exploration. The graph is accompanied by a dashboard visualizing variable distributions. For evaluation, we applied the graph and dashboard to the Hamburg City Health Study data set, a large cohort study conducted in the city of Hamburg, Germany. All data structures can be accessed freely by researchers, physicians, and patients. In addition, we developed a user test conducted with physicians incorporating the System Usability Scale, individual questions, and user tasks.

RESULTS

We evaluated the attribute association graph and dashboard through an exemplary data analysis of participants with a general cardiovascular disease in the Hamburg City Health Study data set. All results extracted from the graph structure and dashboard are in accordance with findings from the literature, except for unusually low cholesterol levels in participants with cardiovascular disease, which could be induced by medication. In addition, 95% CIs of Pearson correlation coefficients were calculated for all associations identified during the data analysis, confirming the results. In addition, a user test with 10 physicians assessing the usability of the proposed methods was conducted. A System Usability Scale score of 70.5% and average successful task completion of 81.4% were reported.

CONCLUSIONS

The proposed attribute association graph and dashboard enable intuitive visual data exploration. They are robust to high-dimensional as well as missing data and require no parameterization. The usability for clinicians was confirmed via a user test, and the validity of the statistical results was confirmed by associations known from literature and standard statistical inference.

Collapse

Napravnik M, Hržić F, Tschauner S, Štajduhar I. Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database. BioData Min 2024;17:22. [PMID: 38997749 PMCID: PMC11245804 DOI: 10.1186/s13040-024-00373-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 06/30/2024] [Indexed: 07/14/2024] Open

Mulat Tebeje T, Kindie Yenit M, Gedlu Nigatu S, Bizuneh Mengistu S, Kidie Tesfie T, Byadgie Gelaw N, Moges Chekol Y. Prediction of diabetic retinopathy among type 2 diabetic patients in University of Gondar Comprehensive Specialized Hospital, 2006-2021: A prognostic model. Int J Med Inform 2024;190:105536. [PMID: 38970878 DOI: 10.1016/j.ijmedinf.2024.105536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 06/26/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024]

Abstract

BACKGROUND

There has been a paucity of evidence for the development of a prediction model for diabetic retinopathy (DR) in Ethiopia. Predicting the risk of developing DR based on the patient's demographic, clinical, and behavioral data is helpful in resource-limited areas where regular screening for DR is not available and to guide practitioners estimate the future risk of their patients.

METHODS

A retrospective follow-up study was conducted at the University of Gondar (UoG) Comprehensive Specialized Hospital from January 2006 to May 2021 among 856 patients with type 2 diabetes (T2DM). Variables were selected using the Least Absolute Shrinkage and Selection Operator (LASSO) regression. The data were validated by 10-fold cross-validation. Four ML techniques (naïve Bayes, K-nearest neighbor, decision tree, and logistic regression) were employed. The performance of each algorithm was measured, and logistic regression was a well-performing algorithm. After multivariable logistic regression and model reduction, a nomogram was developed to predict the individual risk of DR.

RESULTS

Logistic regression was the best algorithm for predicting DR with an area under the curve of 92%, sensitivity of 87%, specificity of 83%, precision of 84%, F1-score of 85%, and accuracy of 85%. The logistic regression model selected seven predictors: total cholesterol, duration of diabetes, glycemic control, adherence to anti-diabetic medications, other microvascular complications of diabetes, sex, and hypertension. A nomogram was developed and deployed as a web-based application. A decision curve analysis showed that the model was useful in clinical practice and was better than treating all or none of the patients.

CONCLUSIONS

The model has excellent performance and a better net benefit to be utilized in clinical practice to show the future probability of having DR. Identifying those with a higher risk of DR helps in the early identification and intervention of DR.

Collapse

Maekawa E, Grua EM, Nakamura CA, Scazufca M, Araya R, Peters T, van de Ven P. Bayesian Networks for Prescreening in Depression: Algorithm Development and Validation. JMIR Ment Health 2024;11:e52045. [PMID: 38963925 PMCID: PMC11258528 DOI: 10.2196/52045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 04/02/2024] [Accepted: 04/17/2024] [Indexed: 07/06/2024] Open

Abstract

BACKGROUND

Identifying individuals with depressive symptomatology (DS) promptly and effectively is of paramount importance for providing timely treatment. Machine learning models have shown promise in this area; however, studies often fall short in demonstrating the practical benefits of using these models and fail to provide tangible real-world applications.

OBJECTIVE

This study aims to establish a novel methodology for identifying individuals likely to exhibit DS, identify the most influential features in a more explainable way via probabilistic measures, and propose tools that can be used in real-world applications.

METHODS

The study used 3 data sets: PROACTIVE, the Brazilian National Health Survey (Pesquisa Nacional de Saúde [PNS]) 2013, and PNS 2019, comprising sociodemographic and health-related features. A Bayesian network was used for feature selection. Selected features were then used to train machine learning models to predict DS, operationalized as a score of ≥10 on the 9-item Patient Health Questionnaire. The study also analyzed the impact of varying sensitivity rates on the reduction of screening interviews compared to a random approach.

RESULTS

The methodology allows the users to make an informed trade-off among sensitivity, specificity, and a reduction in the number of interviews. At the thresholds of 0.444, 0.412, and 0.472, determined by maximizing the Youden index, the models achieved sensitivities of 0.717, 0.741, and 0.718, and specificities of 0.644, 0.737, and 0.766 for PROACTIVE, PNS 2013, and PNS 2019, respectively. The area under the receiver operating characteristic curve was 0.736, 0.801, and 0.809 for these 3 data sets, respectively. For the PROACTIVE data set, the most influential features identified were postural balance, shortness of breath, and how old people feel they are. In the PNS 2013 data set, the features were the ability to do usual activities, chest pain, sleep problems, and chronic back problems. The PNS 2019 data set shared 3 of the most influential features with the PNS 2013 data set. However, the difference was the replacement of chronic back problems with verbal abuse. It is important to note that the features contained in the PNS data sets differ from those found in the PROACTIVE data set. An empirical analysis demonstrated that using the proposed model led to a potential reduction in screening interviews of up to 52% while maintaining a sensitivity of 0.80.

CONCLUSIONS

This study developed a novel methodology for identifying individuals with DS, demonstrating the utility of using Bayesian networks to identify the most significant features. Moreover, this approach has the potential to substantially reduce the number of screening interviews while maintaining high sensitivity, thereby facilitating improved early identification and intervention strategies for individuals experiencing DS.

Collapse

Sajdeya R, Narouze S. Harnessing artificial intelligence for predicting and managing postoperative pain: a narrative literature review. Curr Opin Anaesthesiol 2024:00001503-990000000-00209. [PMID: 39011674 DOI: 10.1097/aco.0000000000001408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]

Yehuala TZ, Agimas MC, Derseh NM, Wubante SM, Fente BM, Yismaw GA, Tesfie TK. Machine learning algorithms to predict healthcare-seeking behaviors of mothers for acute respiratory infections and their determinants among children under five in sub-Saharan Africa. Front Public Health 2024;12:1362392. [PMID: 38962762 PMCID: PMC11220189 DOI: 10.3389/fpubh.2024.1362392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 06/03/2024] [Indexed: 07/05/2024] Open

Abstract

Background

Acute respiratory infections (ARIs) are the leading cause of death in children under the age of 5 globally. Maternal healthcare-seeking behavior may help minimize mortality associated with ARIs since they make decisions about the kind and frequency of healthcare services for their children. Therefore, this study aimed to predict the absence of maternal healthcare-seeking behavior and identify its associated factors among children under the age 5 in sub-Saharan Africa (SSA) using machine learning models.

Methods

The sub-Saharan African countries' demographic health survey was the source of the dataset. We used a weighted sample of 16,832 under-five children in this study. The data were processed using Python (version 3.9), and machine learning models such as extreme gradient boosting (XGB), random forest, decision tree, logistic regression, and Naïve Bayes were applied. In this study, we used evaluation metrics, including the AUC ROC curve, accuracy, precision, recall, and F-measure, to assess the performance of the predictive models.

Result

In this study, a weighted sample of 16,832 under-five children was used in the final analysis. Among the proposed machine learning models, the random forest (RF) was the best-predicted model with an accuracy of 88.89%, a precision of 89.5%, an F-measure of 83%, an AUC ROC curve of 95.8%, and a recall of 77.6% in predicting the absence of mothers' healthcare-seeking behavior for ARIs. The accuracy for Naïve Bayes was the lowest (66.41%) when compared to other proposed models. No media exposure, living in rural areas, not breastfeeding, poor wealth status, home delivery, no ANC visit, no maternal education, mothers' age group of 35-49 years, and distance to health facilities were significant predictors for the absence of mothers' healthcare-seeking behaviors for ARIs. On the other hand, undernourished children with stunting, underweight, and wasting status, diarrhea, birth size, married women, being a male or female sex child, and having a maternal occupation were significantly associated with good maternal healthcare-seeking behaviors for ARIs among under-five children.

Conclusion

The RF model provides greater predictive power for estimating mothers' healthcare-seeking behaviors based on ARI risk factors. Machine learning could help achieve early prediction and intervention in children with high-risk ARIs. This leads to a recommendation for policy direction to reduce child mortality due to ARIs in sub-Saharan countries.

Collapse

Tran VN, Zhou W, Kim T, Mazepa V, Valdayskikh V, Ivanov VY. Daily station-level records of air temperature, snow depth, and ground temperature in the Northern Hemisphere. Sci Data 2024;11:645. [PMID: 38890309 PMCID: PMC11189437 DOI: 10.1038/s41597-024-03483-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open

Chen Y, Lin F, Wang K, Chen F, Wang R, Lai M, Chen C, Wang R. Development of a predictive model for 1-year postoperative recovery in patients with lumbar disk herniation based on deep learning and machine learning. Front Neurol 2024;15:1255780. [PMID: 38919973 PMCID: PMC11197993 DOI: 10.3389/fneur.2024.1255780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 05/23/2024] [Indexed: 06/27/2024] Open

Coats TJ, Mirkes EM. Missing data in emergency care: a pitfall in the interpretation of analysis and research based on electronic patient records. Emerg Med J 2024:emermed-2024-214097. [PMID: 38834288 DOI: 10.1136/emermed-2024-214097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 04/15/2024] [Indexed: 06/06/2024]

Li YX, Liu YC, Wang M, Huang YL. Prediction of gestational diabetes mellitus at the first trimester: machine-learning algorithms. Arch Gynecol Obstet 2024;309:2557-2566. [PMID: 37477677 DOI: 10.1007/s00404-023-07131-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 06/27/2023] [Indexed: 07/22/2023]

Lee CC, Su SY, Sung SF. Machine learning-based survival analysis approaches for predicting the risk of pneumonia post-stroke discharge. Int J Med Inform 2024;186:105422. [PMID: 38518677 DOI: 10.1016/j.ijmedinf.2024.105422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 02/25/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024]

Abstract

BACKGROUND

Post-stroke pneumonia (PSP) is common among stroke patients. PSP occurring after hospital discharge continues to increase the risk of poor functional outcomes and death among stroke survivors. Currently, there is no prediction model specifically designed to predict the occurrence of PSP beyond the acute stage of stroke. This study aimed to explore the use of machine learning (ML) methods in predicting the risk of PSP after hospital discharge.

METHODS

This study analyzed data from 5,754 hospitalized stroke patients. The dataset was randomly divided into a training set and a holdout test set, with a ratio of 80:20. Several clinical and laboratory variables were utilized as predictors and different ML algorithms were employed to model time-to-event data. The ML model's predictive performance was compared to existing risk-scoring systems. A model-agnostic method based on Shapley additive explanations was utilized to interpret the ML model.

RESULTS

The study found that 5.7% of the study patients experienced pneumonia within one year after discharge. Based on repeated 5-fold cross-validation on the training set, the random survival forest (RSF) model had the highest C-index among the various ML algorithms and traditional Cox regression analysis. The final RSF model achieved a C-index of 0.787 (95% confidence interval: 0.737-0.840) on the holdout test set, outperforming five existing risk-scoring systems. The top three important predictors were the Glasgow Coma Scale score, age, and length of hospital stay.

CONCLUSIONS

The RSF model demonstrated superior discriminative ability compared to other ML algorithms and traditional Cox regression analysis, suggesting a non-linear relationship between predictors and outcomes. The developed ML model can be integrated into the hospital information system to provide personalized risk assessments.

Collapse

Tutsoy O, Sumbul HE. A novel deep machine learning algorithm with dimensionality and size reduction approaches for feature elimination: thyroid cancer diagnoses with randomly missing data. Brief Bioinform 2024;25:bbae344. [PMID: 39007597 PMCID: PMC11247408 DOI: 10.1093/bib/bbae344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 06/04/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024] Open

Xie P, Wang H, Xiao J, Xu F, Liu J, Chen Z, Zhao W, Hou S, Wu D, Ma Y, Xiao J. Development and Validation of an Explainable Deep Learning Model to Predict In-Hospital Mortality for Patients With Acute Myocardial Infarction: Algorithm Development and Validation Study. J Med Internet Res 2024;26:e49848. [PMID: 38728685 PMCID: PMC11127140 DOI: 10.2196/49848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/02/2023] [Accepted: 04/02/2024] [Indexed: 05/12/2024] Open

Abstract

BACKGROUND

Acute myocardial infarction (AMI) is one of the most severe cardiovascular diseases and is associated with a high risk of in-hospital mortality. However, the current deep learning models for in-hospital mortality prediction lack interpretability.

OBJECTIVE

This study aims to establish an explainable deep learning model to provide individualized in-hospital mortality prediction and risk factor assessment for patients with AMI.

METHODS

In this retrospective multicenter study, we used data for consecutive patients hospitalized with AMI from the Chongqing University Central Hospital between July 2016 and December 2022 and the Electronic Intensive Care Unit Collaborative Research Database. These patients were randomly divided into training (7668/10,955, 70%) and internal test (3287/10,955, 30%) data sets. In addition, data of patients with AMI from the Medical Information Mart for Intensive Care database were used for external validation. Deep learning models were used to predict in-hospital mortality in patients with AMI, and they were compared with linear and tree-based models. The Shapley Additive Explanations method was used to explain the model with the highest area under the receiver operating characteristic curve in both the internal test and external validation data sets to quantify and visualize the features that drive predictions.

RESULTS

A total of 10,955 patients with AMI who were admitted to Chongqing University Central Hospital or included in the Electronic Intensive Care Unit Collaborative Research Database were randomly divided into a training data set of 7668 (70%) patients and an internal test data set of 3287 (30%) patients. A total of 9355 patients from the Medical Information Mart for Intensive Care database were included for independent external validation. In-hospital mortality occurred in 8.74% (670/7668), 8.73% (287/3287), and 9.12% (853/9355) of the patients in the training, internal test, and external validation cohorts, respectively. The Self-Attention and Intersample Attention Transformer model performed best in both the internal test data set and the external validation data set among the 9 prediction models, with the highest area under the receiver operating characteristic curve of 0.86 (95% CI 0.84-0.88) and 0.85 (95% CI 0.84-0.87), respectively. Older age, high heart rate, and low body temperature were the 3 most important predictors of increased mortality, according to the explanations of the Self-Attention and Intersample Attention Transformer model.

CONCLUSIONS

The explainable deep learning model that we developed could provide estimates of mortality and visual contribution of the features to the prediction for a patient with AMI. The explanations suggested that older age, unstable vital signs, and metabolic disorders may increase the risk of mortality in patients with AMI.

Collapse

Kazdaghli S, Kerenidis I, Kieckbusch J, Teare P. Improved clinical data imputation via classical and quantum determinantal point processes. eLife 2024;12:RP89947. [PMID: 38722146 PMCID: PMC11081629 DOI: 10.7554/elife.89947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2024] Open

Tejani AS, Ng YS, Xi Y, Rayan JC. Understanding and Mitigating Bias in Imaging Artificial Intelligence. Radiographics 2024;44:e230067. [PMID: 38635456 DOI: 10.1148/rg.230067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2024]

Abstract

Artificial intelligence (AI) algorithms are prone to bias at multiple stages of model development, with potential for exacerbating health disparities. However, bias in imaging AI is a complex topic that encompasses multiple coexisting definitions. Bias may refer to unequal preference to a person or group owing to preexisting attitudes or beliefs, either intentional or unintentional. However, cognitive bias refers to systematic deviation from objective judgment due to reliance on heuristics, and statistical bias refers to differences between true and expected values, commonly manifesting as systematic error in model prediction (ie, a model with output unrepresentative of real-world conditions). Clinical decisions informed by biased models may lead to patient harm due to action on inaccurate AI results or exacerbate health inequities due to differing performance among patient populations. However, while inequitable bias can harm patients in this context, a mindful approach leveraging equitable bias can address underrepresentation of minority groups or rare diseases. Radiologists should also be aware of bias after AI deployment such as automation bias, or a tendency to agree with automated decisions despite contrary evidence. Understanding common sources of imaging AI bias and the consequences of using biased models can guide preventive measures to mitigate its impact. Accordingly, the authors focus on sources of bias at stages along the imaging machine learning life cycle, attempting to simplify potentially intimidating technical terminology for general radiologists using AI tools in practice or collaborating with data scientists and engineers for AI tool development. The authors review definitions of bias in AI, describe common sources of bias, and present recommendations to guide quality control measures to mitigate the impact of bias in imaging AI. Understanding the terms featured in this article will enable a proactive approach to identifying and mitigating bias in imaging AI. Published under a CC BY 4.0 license. Test Your Knowledge questions for this article are available in the supplemental material. See the invited commentary by Rouzrokh and Erickson in this issue.

Collapse

Shi S, Bao J, Guo Z, Han Y, Xu Y, Egbeagu UU, Zhao L, Jiang N, Sun L, Liu X, Liu W, Chang N, Zhang J, Sun Y, Xu X, Fu S. Improving prediction of N₂O emissions during composting using model-agnostic meta-learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024;922:171357. [PMID: 38431167 DOI: 10.1016/j.scitotenv.2024.171357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/05/2024]

Syed T, Krujatz F, Ihadjadene Y, Mühlstädt G, Hamedi H, Mädler J, Urbas L. A review on machine learning approaches for microalgae cultivation systems. Comput Biol Med 2024;172:108248. [PMID: 38493599 DOI: 10.1016/j.compbiomed.2024.108248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 02/15/2024] [Accepted: 03/06/2024] [Indexed: 03/19/2024]

Ellen JG, Matos J, Viola M, Gallifant J, Quion J, Anthony Celi L, Abu Hussein NS. Participant flow diagrams for health equity in AI. J Biomed Inform 2024;152:104631. [PMID: 38548006 DOI: 10.1016/j.jbi.2024.104631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 12/29/2023] [Accepted: 03/26/2024] [Indexed: 04/01/2024]

Chadaga K, Prabhu S, Sampathila N, Chadaga R, Bhat D, Sharma AK, Swathi KS. SADXAI: Predicting social anxiety disorder using multiple interpretable artificial intelligence techniques. SLAS Technol 2024;29:100129. [PMID: 38508237 DOI: 10.1016/j.slast.2024.100129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 03/17/2024] [Indexed: 03/22/2024]

Muludi K, Setianingsih R, Sholehurrohman R, Junaidi A. Exploiting nearest neighbor data and fuzzy membership function to address missing values in classification. PeerJ Comput Sci 2024;10:e1968. [PMID: 38660203 PMCID: PMC11042039 DOI: 10.7717/peerj-cs.1968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/07/2024] [Indexed: 04/26/2024]

Parsaei M, Arvin A, Taebi M, Seyedmirzaei H, Cattarinussi G, Sambataro F, Pigoni A, Brambilla P, Delvecchio G. Machine Learning for prediction of violent behaviors in schizophrenia spectrum disorders: a systematic review. Front Psychiatry 2024;15:1384828. [PMID: 38577400 PMCID: PMC10991827 DOI: 10.3389/fpsyt.2024.1384828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 03/08/2024] [Indexed: 04/06/2024] Open

Affiliation(s)

Mohammadamin Parsaei Maternal, Fetal & Neonatal Research Center, Family Health Research Institute, Tehran University of Medical Sciences, Tehran, Iran
Alireza Arvin Center for Orthopedic Trans-disciplinary Applied Research (COTAR), Tehran University of Medical Sciences, Tehran, Iran
Morvarid Taebi Center for Orthopedic Trans-disciplinary Applied Research (COTAR), Tehran University of Medical Sciences, Tehran, Iran
Homa Seyedmirzaei Sports Medicine Research Center, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran
Giulia Cattarinussi Department of Neuroscience (DNS), Padua Neuroscience Center, University of Padova, Padua, Italy Padua Neuroscience Center, University of Padova, Padua, Italy Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, United Kingdom
Fabio Sambataro Department of Neuroscience (DNS), Padua Neuroscience Center, University of Padova, Padua, Italy Padua Neuroscience Center, University of Padova, Padua, Italy
Alessandro Pigoni Social and Affective Neuroscience Group, MoMiLab, Institutions, Markets, Technologies (IMT) School for Advanced Studies Lucca, Lucca, Italy Department of Pathophysiology and Transplantation, University of Milan, Milan, Italy
Paolo Brambilla Social and Affective Neuroscience Group, MoMiLab, Institutions, Markets, Technologies (IMT) School for Advanced Studies Lucca, Lucca, Italy Department of Pathophysiology and Transplantation, University of Milan, Milan, Italy Department of Neurosciences and Mental Health, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
Giuseppe Delvecchio Department of Neurosciences and Mental Health, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy

Collapse

Wang M, Ye XW, Ying XH, Jia JD, Ding Y, Zhang D, Sun F. Data Imputation of Soil Pressure on Shield Tunnel Lining Based on Random Forest Model. SENSORS (BASEL, SWITZERLAND) 2024;24:1560. [PMID: 38475093 DOI: 10.3390/s24051560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024]

Li J, Guo S, Ma R, He J, Zhang X, Rui D, Ding Y, Li Y, Jian L, Cheng J, Guo H. Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC Med Res Methodol 2024;24:41. [PMID: 38365610 PMCID: PMC10870437 DOI: 10.1186/s12874-024-02173-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 02/05/2024] [Indexed: 02/18/2024] Open

Abstract

BACKGROUND

Missing data is frequently an inevitable issue in cohort studies and it can adversely affect the study's findings. We assess the effectiveness of eight frequently utilized statistical and machine learning (ML) imputation methods for dealing with missing data in predictive modelling of cohort study datasets. This evaluation is based on real data and predictive models for cardiovascular disease (CVD) risk.

METHODS

The data is from a real-world cohort study in Xinjiang, China. It includes personal information, physical examination data, questionnaires, and laboratory biochemical results from 10,164 subjects with a total of 37 variables. Simple imputation (Simple), regression imputation (Regression), expectation-maximization(EM), multiple imputation (MICE) , K nearest neighbor classification (KNN), clustering imputation (Cluster), random forest (RF), and decision tree (Cart) were the chosen imputation methods. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are utilised to assess the performance of different methods for missing data imputation at a missing rate of 20%. The datasets processed with different missing data imputation methods were employed to construct a CVD risk prediction model utilizing the support vector machine (SVM). The predictive performance was then compared using the area under the curve (AUC).

RESULTS

The most effective imputation results were attained by KNN (MAE: 0.2032, RMSE: 0.7438, AUC: 0.730, CI: 0.719-0.741) and RF (MAE: 0.3944, RMSE: 1.4866, AUC: 0.777, CI: 0.769-0.785). The subsequent best performances were achieved by EM, Cart, and MICE, while Simple, Regression, and Cluster attained the worst performances. The CVD risk prediction model was constructed using the complete data (AUC:0.804, CI:0.796-0.812) in comparison with all other models with p<0.05.

CONCLUSION

KNN and RF exhibit superior performance and are more adept at imputing missing data in predictive modelling of cohort study datasets.

Collapse

Affiliation(s)

JiaHang Li Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
ShuXia Guo Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
RuLin Ma Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
Jia He Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
XiangHui Zhang Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
DongSheng Rui Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
YuSong Ding Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
Yu Li Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China
LeYao Jian Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
Jing Cheng Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China
Heng Guo Department of Public Health, Shihezi University School of Medicine, North 2th Road, Shihezi, 832003, Xinjiang, China. Key Laboratory for Prevention and Control of Emerging Infectious Diseases and Public Health Security, the Xinjiang Production and Construction Corps, Shihezi, Xinjiang, 832000, China.

Collapse

Ghaedi H, Davey SK, Feilotter H. Variant Classification Discordance: Contributing Factors and Predictive Models. J Mol Diagn 2024;26:115-126. [PMID: 38008287 DOI: 10.1016/j.jmoldx.2023.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 08/04/2023] [Accepted: 11/07/2023] [Indexed: 11/28/2023] Open

Joseph J, Niemczak C, Lichtenstein J, Kobrina A, Magohe A, Leigh S, Ealer C, Fellows A, Reike C, Massawe E, Gui J, Buckey JC. Central auditory test performance predicts future neurocognitive function in children living with and without HIV. Sci Rep 2024;14:2712. [PMID: 38302516 PMCID: PMC10834399 DOI: 10.1038/s41598-024-52380-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 01/18/2024] [Indexed: 02/03/2024] Open

Jacob Junior AFL, do Carmo FA, de Santana AL, Santana EEC, Lobato FMF. EvoImp: Multiple Imputation of Multi-label Classification data with a genetic algorithm. PLoS One 2024;19:e0297147. [PMID: 38241256 PMCID: PMC10798481 DOI: 10.1371/journal.pone.0297147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 12/28/2023] [Indexed: 01/21/2024] Open

Chung CW, Chou SC, Hsiao TH, Zhang GJ, Chung YF, Chen YM. Machine learning approaches to identify systemic lupus erythematosus in anti-nuclear antibody-positive patients using genomic data and electronic health records. BioData Min 2024;17:1. [PMID: 38183082 PMCID: PMC10770905 DOI: 10.1186/s13040-023-00352-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 12/19/2023] [Indexed: 01/07/2024] Open

Weng X, Song H, Lin Y, Wu Y, Zhang X, Liu B, Yang J. A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks. Comput Biol Med 2024;168:107687. [PMID: 38007974 DOI: 10.1016/j.compbiomed.2023.107687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 10/07/2023] [Accepted: 11/06/2023] [Indexed: 11/28/2023]

Xu Z, Tang J, Qi C, Yao D, Liu C, Zhan Y, Lukasiewicz T. Cross-domain attention-guided generative data augmentation for medical image analysis with limited data. Comput Biol Med 2024;168:107744. [PMID: 38006826 DOI: 10.1016/j.compbiomed.2023.107744] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 11/12/2023] [Accepted: 11/20/2023] [Indexed: 11/27/2023]

Abstract

Data augmentation is widely applied to medical image analysis tasks in limited datasets with imbalanced classes and insufficient annotations. However, traditional augmentation techniques cannot supply extra information, making the performance of diagnosis unsatisfactory. GAN-based generative methods have thus been proposed to obtain additional useful information to realize more effective data augmentation; but existing generative data augmentation techniques mainly encounter two problems: (i) Current generative data augmentation lacks of the capability in using cross-domain differential information to extend limited datasets. (ii) The existing generative methods cannot provide effective supervised information in medical image segmentation tasks. To solve these problems, we propose an attention-guided cross-domain tumor image generation model (CDA-GAN) with an information enhancement strategy. The CDA-GAN can generate diverse samples to expand the scale of datasets, improving the performance of medical image diagnosis and treatment tasks. In particular, we incorporate channel attention into a CycleGAN-based cross-domain generation network that captures inter-domain information and generates positive or negative samples of brain tumors. In addition, we propose a semi-supervised spatial attention strategy to guide spatial information of features at the pixel level in tumor generation. Furthermore, we add spectral normalization to prevent the discriminator from mode collapse and stabilize the training procedure. Finally, to resolve an inapplicability problem in the segmentation task, we further propose an application strategy of using this data augmentation model to achieve more accurate medical image segmentation with limited data. Experimental studies on two public brain tumor datasets (BraTS and TCIA) show that the proposed CDA-GAN model greatly outperforms the state-of-the-art generative data augmentation in both practical medical image classification tasks and segmentation tasks; e.g. CDA-GAN is 0.50%, 1.72%, 2.05%, and 0.21% better than the best SOTA baseline in terms of ACC, AUC, Recall, and F1, respectively, in the classification task of BraTS, while its improvements w.r.t. the best SOTA baseline in terms of Dice, Sens, HD95, and mIOU, in the segmentation task of TCIA are 2.50%, 0.90%, 14.96%, and 4.18%, respectively.

Collapse

Yoon M, Park JJ, Hur T, Hua CH, Hussain M, Lee S, Choi DJ. Application and Potential of Artificial Intelligence in Heart Failure: Past, Present, and Future. INTERNATIONAL JOURNAL OF HEART FAILURE 2024;6:11-19. [PMID: 38303917 PMCID: PMC10827704 DOI: 10.36628/ijhf.2023.0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 02/03/2024]

Grzenda A, Widge AS. Electronic health records and stratified psychiatry: bridge to precision treatment? Neuropsychopharmacology 2024;49:285-290. [PMID: 37667021 PMCID: PMC10700348 DOI: 10.1038/s41386-023-01724-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/24/2023] [Accepted: 08/27/2023] [Indexed: 09/06/2023]

Murray JD, Lange JJ, Bennett-Lenane H, Holm R, Kuentz M, O'Dwyer PJ, Griffin BT. Advancing algorithmic drug product development: Recommendations for machine learning approaches in drug formulation. Eur J Pharm Sci 2023;191:106562. [PMID: 37562550 DOI: 10.1016/j.ejps.2023.106562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/09/2023] [Accepted: 08/07/2023] [Indexed: 08/12/2023]

Ferri P, Romero-Garcia N, Badenes R, Lora-Pablos D, Morales TG, Gómez de la Cámara A, García-Gómez JM, Sáez C. Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;242:107803. [PMID: 37703700 DOI: 10.1016/j.cmpb.2023.107803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/28/2023] [Accepted: 09/05/2023] [Indexed: 09/15/2023]

Li Q, He Y, Pan J. CrossFuse-XGBoost: accurate prediction of the maximum recommended daily dose through multi-feature fusion, cross-validation screening and extreme gradient boosting. Brief Bioinform 2023;25:bbad511. [PMID: 38216539 PMCID: PMC10786712 DOI: 10.1093/bib/bbad511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/04/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024] Open

Francoeur PG, Koes DR. Expanding Training Data for Structure-Based Receptor-Ligand Binding Affinity Regression through Imputation of Missing Labels. ACS OMEGA 2023;8:41680-41688. [PMID: 37970017 PMCID: PMC10634251 DOI: 10.1021/acsomega.3c05931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/10/2023] [Accepted: 10/17/2023] [Indexed: 11/17/2023]

Kim P, Serov N, Falchevskaya A, Shabalkin I, Dmitrenko A, Kladko D, Vinogradov V. Quantifying the Efficacy of Magnetic Nanoparticles for MRI and Hyperthermia Applications via Machine Learning Methods. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023;19:e2303522. [PMID: 37563807 DOI: 10.1002/smll.202303522] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/16/2023] [Indexed: 08/12/2023]

Spence C, Shah OA, Cebula A, Tucker K, Sochart D, Kader D, Asopa V. Machine learning models to predict surgical case duration compared to current industry standards: scoping review. BJS Open 2023;7:zrad113. [PMID: 37931236 PMCID: PMC10630142 DOI: 10.1093/bjsopen/zrad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 09/21/2023] [Accepted: 09/21/2023] [Indexed: 11/08/2023] Open

Zieliński K, Drabczyk D, Kunicki M, Drzyzga D, Kloska A, Rumiński J. Evaluating the risk of endometriosis based on patients' self-assessment questionnaires. Reprod Biol Endocrinol 2023;21:102. [PMID: 37898817 PMCID: PMC10612251 DOI: 10.1186/s12958-023-01156-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 10/23/2023] [Indexed: 10/30/2023] Open

Abstract

BACKGROUND

Endometriosis is a condition that significantly affects the quality of life of about 10 % of reproductive-aged women. It is characterized by the presence of tissue similar to the uterine lining (endometrium) outside the uterus, which can lead lead scarring, adhesions, pain, and fertility issues. While numerous factors associated with endometriosis are documented, a wide range of symptoms may still be undiscovered.

METHODS

In this study, we employed machine learning algorithms to predict endometriosis based on the patient symptoms extracted from 13,933 questionnaires. We compared the results of feature selection obtained from various algorithms (i.e., Boruta algorithm, Recursive Feature Selection) with experts' decisions. As a benchmark model architecture, we utilized a LightGBM algorithm, along with Multivariate Imputation by Chained Equations (MICE) and k-nearest neighbors (KNN), for missing data imputation. Our primary objective was to assess the model's performance and feature importance compared to existing studies.

RESULTS

We identified the top 20 predictors of endometriosis, uncovering previously overlooked features such as Cesarean section, ovarian cysts, and hernia. Notably, the model's performance metrics were maximized when utilizing a combination of multiple feature selection methods. Specifically, the final model achieved an area under the receiver operator characteristic curve (AUC) of 0.85 on the training dataset and an AUC of 0.82 on the testing dataset.

CONCLUSIONS

The application of machine learning in diagnosing endometriosis has the potential to significantly impact clinical practice, streamlining the diagnostic process and enhancing efficiency. Our questionnaire-based prediction approach empowers individuals with endometriosis to proactively identify potential symptoms, facilitating informed discussions with healthcare professionals about diagnosis and treatment options.

Collapse

Gan Q, Gong L, Hu D, Jiang Y, Ding X. A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset. SENSORS (BASEL, SWITZERLAND) 2023;23:8678. [PMID: 37960379 PMCID: PMC10650138 DOI: 10.3390/s23218678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/07/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023]

Shadbahr T, Roberts M, Stanczuk J, Gilbey J, Teare P, Dittmer S, Thorpe M, Torné RV, Sala E, Lió P, Patel M, Preller J, Rudd JHF, Mirtti T, Rannikko AS, Aston JAD, Tang J, Schönlieb CB. The impact of imputation quality on machine learning classifiers for datasets with missing values. COMMUNICATIONS MEDICINE 2023;3:139. [PMID: 37803172 PMCID: PMC10558448 DOI: 10.1038/s43856-023-00356-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/13/2023] [Indexed: 10/08/2023] Open

Affiliation(s)

Tolou Shadbahr Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Michael Roberts Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK. Data Science & Artificial Intelligence, AstraZeneca, Cambridge, UK.
Jan Stanczuk Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
Julian Gilbey Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
Philip Teare Data Science & Artificial Intelligence, AstraZeneca, Cambridge, UK
Sören Dittmer Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK ZeTeM, University of Bremen, Bremen, Germany
Matthew Thorpe Department of Mathematics, University of Manchester, Manchester, UK
Ramon Viñas Torné Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
Evis Sala Department of Radiology, University of Cambridge, Cambridge, UK
Pietro Lió Department of Mathematics, University of Manchester, Manchester, UK
Mishal Patel Data Science & Artificial Intelligence, AstraZeneca, Cambridge, UK Clinical Pharmacology & Safety Sciences, AstraZeneca, Cambridge, UK
Jacobus Preller Addenbrooke's Hospital, Cambridge University Hospitals NHS Trust, Cambridge, UK
James H F Rudd Department of Medicine, University of Cambridge, Cambridge, UK
Tuomas Mirtti Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland Department of Pathology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland iCAN-Digital Precision Cancer Medicine Flagship, Helsinki, Finland
Antti Sakari Rannikko Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland iCAN-Digital Precision Cancer Medicine Flagship, Helsinki, Finland Department of Urology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
John A D Aston Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
Jing Tang Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Carola-Bibiane Schönlieb Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK

Collapse

Tumusiime AG, Eyobu OS, Mugume I, Oyana TJ. A weather features dataset for prediction of short-term rainfall quantities in Uganda. Data Brief 2023;50:109613. [PMID: 37808539 PMCID: PMC10551829 DOI: 10.1016/j.dib.2023.109613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/11/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023] Open

Altuhaifa FA, Win KT, Su G. Predicting lung cancer survival based on clinical data using machine learning: A review. Comput Biol Med 2023;165:107338. [PMID: 37625260 DOI: 10.1016/j.compbiomed.2023.107338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/31/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023]

Liu R, Wang Z, Qiu J, Wang X. Assigning channel weights using an attention mechanism: an EEG interpolation algorithm. Front Neurosci 2023;17:1251677. [PMID: 37811329 PMCID: PMC10552919 DOI: 10.3389/fnins.2023.1251677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 09/06/2023] [Indexed: 10/10/2023] Open

Kostekci YE, Bakırarar B, Okulu E, Erdeve O, Atasay B, Arsan S. An Early Prediction Model for Estimating Bronchopulmonary Dysplasia in Preterm Infants. Neonatology 2023;120:709-717. [PMID: 37725910 DOI: 10.1159/000533299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 07/22/2023] [Indexed: 09/21/2023]

Abnoosian K, Farnoosh R, Behzadi MH. Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. BMC Bioinformatics 2023;24:337. [PMID: 37697283 PMCID: PMC10496262 DOI: 10.1186/s12859-023-05465-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/04/2023] [Indexed: 09/13/2023] Open

Abstract

BACKGROUND AND OBJECTIVE

Diabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled data, frequent missing values, and dataset imbalance hinder the development of accurate prediction models. Therefore, a novel framework is required to address these challenges and improve performance.

METHODS

In this study, we propose an innovative pipeline-based multi-classification framework to predict diabetes in three classes: diabetic, non-diabetic, and prediabetes, using the imbalanced Iraqi Patient Dataset of Diabetes. Our framework incorporates various pre-processing techniques, including duplicate sample removal, attribute conversion, missing value imputation, data normalization and standardization, feature selection, and k-fold cross-validation. Furthermore, we implement multiple machine learning models, such as k-NN, SVM, DT, RF, AdaBoost, and GNB, and introduce a weighted ensemble approach based on the Area Under the Receiver Operating Characteristic Curve (AUC) to address dataset imbalance. Performance optimization is achieved through grid search and Bayesian optimization for hyper-parameter tuning.

RESULTS

Our proposed model outperforms other machine learning models, including k-NN, SVM, DT, RF, AdaBoost, and GNB, in predicting diabetes. The model achieves high average accuracy, precision, recall, F1-score, and AUC values of 0.9887, 0.9861, 0.9792, 0.9851, and 0.999, respectively.

CONCLUSION

Our pipeline-based multi-classification framework demonstrates promising results in accurately predicting diabetes using an imbalanced dataset of Iraqi diabetic patients. The proposed framework addresses the challenges associated with limited labeled data, missing values, and dataset imbalance, leading to improved prediction performance. This study highlights the potential of machine learning techniques in diabetes diagnosis and management, and the proposed framework can serve as a valuable tool for accurate prediction and improved patient care. Further research can build upon our work to refine and optimize the framework and explore its applicability in diverse datasets and populations.

Collapse

Drosouli I, Voulodimos A, Mastorocostas P, Miaoulis G, Ghazanfarpour D. A Spatial-Temporal Graph Convolutional Recurrent Network for Transportation Flow Estimation. SENSORS (BASEL, SWITZERLAND) 2023;23:7534. [PMID: 37687992 PMCID: PMC10490678 DOI: 10.3390/s23177534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 08/25/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023]

Habenicht R, Fehrmann E, Blohm P, Ebenbichler G, Fischer-Grote L, Kollmitzer J, Mair P, Kienbacher T. Machine Learning Based Linking of Patient Reported Outcome Measures to WHO International Classification of Functioning, Disability, and Health Activity/Participation Categories. J Clin Med 2023;12:5609. [PMID: 37685676 PMCID: PMC10488436 DOI: 10.3390/jcm12175609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/06/2023] [Accepted: 08/23/2023] [Indexed: 09/10/2023] Open

Abstract

BACKGROUND

In the primary and secondary medical health sector, patient reported outcome measures (PROMs) are widely used to assess a patient's disease-related functional health state. However, the World Health Organization (WHO), in its recently adopted resolution on "strengthening rehabilitation in all health systems", encourages that all health sectors, not only the rehabilitation sector, classify a patient's functioning and health state according to the International Classification of Functioning, Disability and Health (ICF).

AIM

This research sought to optimize machine learning (ML) methods that fully and automatically link information collected from PROMs in persons with unspecific chronic low back pain (cLBP) to limitations in activities and restrictions in participation that are listed in the WHO core set categories for LBP. The study also aimed to identify the minimal set of PROMs necessary for linking without compromising performance.

METHODS

A total of 806 patients with cLBP completed a comprehensive set of validated PROMs and were interviewed by clinical psychologists who assessed patients' performance in activity limitations and restrictions in participation according to the ICF brief core set for low back pain (LBP). The information collected was then utilized to further develop random forest (RF) methods that classified the presence or absence of a problem within each of the activity participation ICF categories of the ICF core set for LBP. Further analyses identified those PROM items relevant to the linking process and validated the respective linking performance that utilized a minimal subset of items.

RESULTS

Compared to a recently developed ML linking method, receiver operating characteristic curve (ROC-AUC) values for the novel RF methods showed overall improved performance, with AUC values ranging from 0.73 for the ICF category d850 to 0.81 for the ICF category d540. Variable importance measurements revealed that minimal subsets of either 24 or 15 important PROM variables (out of 80 items included in full set of PROMs) would show similar linking performance.

CONCLUSIONS

Findings suggest that our optimized ML based methods more accurately predict the presence or absence of limitations and restrictions listed in ICF core categories for cLBP. In addition, this accurate performance would not suffer if the list of PROM items was reduced to a minimum of 15 out of 80 items assessed.

Collapse

Chaumeil M, Guglielmetti C, Qiao K, Tiret B, Ozen M, Krukowski K, Nolan A, Paladini MS, Lopez C, Rosi S. Hyperpolarized ¹³C metabolic imaging detects long-lasting metabolic alterations following mild repetitive traumatic brain injury. RESEARCH SQUARE 2023:rs.3.rs-3166656. [PMID: 37645937 PMCID: PMC10462249 DOI: 10.21203/rs.3.rs-3166656/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Timilsina M, Fey D, Buosi S, Janik A, Costabello L, Carcereny E, Abreu DR, Cobo M, Castro RL, Bernabé R, Minervini P, Torrente M, Provencio M, Nováček V. Synergy between imputed genetic pathway and clinical information for predicting recurrence in early stage non-small cell lung cancer. J Biomed Inform 2023;144:104424. [PMID: 37352900 DOI: 10.1016/j.jbi.2023.104424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/06/2023] [Accepted: 06/11/2023] [Indexed: 06/25/2023]