1
|
Rakers MM, van Buchem MM, Kucenko S, de Hond A, Kant I, van Smeden M, Moons KGM, Leeuwenberg AM, Chavannes N, Villalobos-Quesada M, van Os HJA. Availability of Evidence for Predictive Machine Learning Algorithms in Primary Care: A Systematic Review. JAMA Netw Open 2024; 7:e2432990. [PMID: 39264624 PMCID: PMC11393722 DOI: 10.1001/jamanetworkopen.2024.32990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open
Abstract
Importance The aging and multimorbid population and health personnel shortages pose a substantial burden on primary health care. While predictive machine learning (ML) algorithms have the potential to address these challenges, concerns include transparency and insufficient reporting of model validation and effectiveness of the implementation in the clinical workflow. Objectives To systematically identify predictive ML algorithms implemented in primary care from peer-reviewed literature and US Food and Drug Administration (FDA) and Conformité Européene (CE) registration databases and to ascertain the public availability of evidence, including peer-reviewed literature, gray literature, and technical reports across the artificial intelligence (AI) life cycle. Evidence Review PubMed, Embase, Web of Science, Cochrane Library, Emcare, Academic Search Premier, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI.org (Association for the Advancement of Artificial Intelligence), arXiv, Epistemonikos, PsycINFO, and Google Scholar were searched for studies published between January 2000 and July 2023, with search terms that were related to AI, primary care, and implementation. The search extended to CE-marked or FDA-approved predictive ML algorithms obtained from relevant registration databases. Three reviewers gathered subsequent evidence involving strategies such as product searches, exploration of references, manufacturer website visits, and direct inquiries to authors and product owners. The extent to which the evidence for each predictive ML algorithm aligned with the Dutch AI predictive algorithm (AIPA) guideline requirements was assessed per AI life cycle phase, producing evidence availability scores. Findings The systematic search identified 43 predictive ML algorithms, of which 25 were commercially available and CE-marked or FDA-approved. The predictive ML algorithms spanned multiple clinical domains, but most (27 [63%]) focused on cardiovascular diseases and diabetes. Most (35 [81%]) were published within the past 5 years. The availability of evidence varied across different phases of the predictive ML algorithm life cycle, with evidence being reported the least for phase 1 (preparation) and phase 5 (impact assessment) (19% and 30%, respectively). Twelve (28%) predictive ML algorithms achieved approximately half of their maximum individual evidence availability score. Overall, predictive ML algorithms from peer-reviewed literature showed higher evidence availability compared with those from FDA-approved or CE-marked databases (45% vs 29%). Conclusions and Relevance The findings indicate an urgent need to improve the availability of evidence regarding the predictive ML algorithms' quality criteria. Adopting the Dutch AIPA guideline could facilitate transparent and consistent reporting of the quality criteria that could foster trust among end users and facilitating large-scale implementation.
Collapse
Affiliation(s)
- Margot M Rakers
- Department of Public Health and Primary Care, Leiden University Medical Centre, ZA Leiden, the Netherlands
- National eHealth Living Lab, Leiden University Medical Centre, ZA Leiden, the Netherlands
| | - Marieke M van Buchem
- Department of Information Technology and Digital Innovation, Leiden University Medical Center, ZA Leiden, the Netherlands
| | - Sergej Kucenko
- Hamburg University of Applied Sciences, Department of Health Sciences, Ulmenliet 20, Hamburg, Germany
| | - Anne de Hond
- Department of Digital Health, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
| | - Ilse Kant
- Department of Digital Health, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Universiteitsweg 100, CG Utrecht, the Netherlands
| | - Niels Chavannes
- Department of Public Health and Primary Care, Leiden University Medical Centre, ZA Leiden, the Netherlands
- National eHealth Living Lab, Leiden University Medical Centre, ZA Leiden, the Netherlands
| | - María Villalobos-Quesada
- Department of Public Health and Primary Care, Leiden University Medical Centre, ZA Leiden, the Netherlands
- National eHealth Living Lab, Leiden University Medical Centre, ZA Leiden, the Netherlands
| | - Hendrikus J A van Os
- Department of Public Health and Primary Care, Leiden University Medical Centre, ZA Leiden, the Netherlands
- National eHealth Living Lab, Leiden University Medical Centre, ZA Leiden, the Netherlands
| |
Collapse
|
2
|
Askar M, Tafavvoghi M, Småbrekke L, Bongo LA, Svendsen K. Using machine learning methods to predict all-cause somatic hospitalizations in adults: A systematic review. PLoS One 2024; 19:e0309175. [PMID: 39178283 PMCID: PMC11343463 DOI: 10.1371/journal.pone.0309175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 08/06/2024] [Indexed: 08/25/2024] Open
Abstract
AIM In this review, we investigated how Machine Learning (ML) was utilized to predict all-cause somatic hospital admissions and readmissions in adults. METHODS We searched eight databases (PubMed, Embase, Web of Science, CINAHL, ProQuest, OpenGrey, WorldCat, and MedNar) from their inception date to October 2023, and included records that predicted all-cause somatic hospital admissions and readmissions of adults using ML methodology. We used the CHARMS checklist for data extraction, PROBAST for bias and applicability assessment, and TRIPOD for reporting quality. RESULTS We screened 7,543 studies of which 163 full-text records were read and 116 met the review inclusion criteria. Among these, 45 predicted admission, 70 predicted readmission, and one study predicted both. There was a substantial variety in the types of datasets, algorithms, features, data preprocessing steps, evaluation, and validation methods. The most used types of features were demographics, diagnoses, vital signs, and laboratory tests. Area Under the ROC curve (AUC) was the most used evaluation metric. Models trained using boosting tree-based algorithms often performed better compared to others. ML algorithms commonly outperformed traditional regression techniques. Sixteen studies used Natural language processing (NLP) of clinical notes for prediction, all studies yielded good results. The overall adherence to reporting quality was poor in the review studies. Only five percent of models were implemented in clinical practice. The most frequently inadequately addressed methodological aspects were: providing model interpretations on the individual patient level, full code availability, performing external validation, calibrating models, and handling class imbalance. CONCLUSION This review has identified considerable concerns regarding methodological issues and reporting quality in studies investigating ML to predict hospitalizations. To ensure the acceptability of these models in clinical settings, it is crucial to improve the quality of future studies.
Collapse
Affiliation(s)
- Mohsen Askar
- Faculty of Health Sciences, Department of Pharmacy, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Masoud Tafavvoghi
- Faculty of Science and Technology, Department of Computer Science, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Lars Småbrekke
- Faculty of Health Sciences, Department of Pharmacy, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Lars Ailo Bongo
- Faculty of Science and Technology, Department of Computer Science, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Kristian Svendsen
- Faculty of Health Sciences, Department of Pharmacy, UiT-The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
3
|
Feller D, Wingbermuhle R, Berg B, Vigdal ØN, Innocenti T, Grotle M, Ostelo R, Chiarotto A. Improvements Are Needed in the Adherence to the TRIPOD Statement for Clinical Prediction Models for Patients With Spinal Pain or Osteoarthritis: A Metaresearch Study. THE JOURNAL OF PAIN 2024:104624. [PMID: 39002741 DOI: 10.1016/j.jpain.2024.104624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 06/26/2024] [Accepted: 07/01/2024] [Indexed: 07/15/2024]
Abstract
This metaresearch study aimed to evaluate the completeness of reporting of prediction model studies in patients with spinal pain or osteoarthritis (OA) in terms of adherence to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement. We searched for prognostic and diagnostic prediction models in patients with spinal pain or OA in MEDLINE, Embase, Web of Science, and CINAHL. Using a standardized assessment form, we assessed the adherence to the TRIPOD of the included studies. Two independent reviewers performed the study selection and data extraction phases. We included 66 studies. Approximately 35% of the studies declared to have used the TRIPOD. The median adherence to the TRIPOD was 59% overall (interquartile range (IQR): 21.8), with the items of the methods and results sections having the worst reporting. Studies on neck pain had better adherence to the TRIPOD than studies on back pain and OA (medians of 76.5%, 59%, and 53%, respectively). External validation studies had the highest total adherence (median: 79.5%, IQR: 12.8) of all the study types. The median overall adherence was 4 points higher in studies that declared TRIPOD use than those that did not. Finally, we did not observe any improvement in adherence over the years. The adherence to the TRIPOD of prediction models in the spinal and OA fields is low, with the methods and results sections being the most poorly reported. Future studies on prediction models in spinal pain and OA should follow the TRIPOD to improve their reporting completeness. PERSPECTIVE: This article provides data about adherence to the TRIPOD statement in 66 prediction model studies for spinal pain or OA. The adherence to the TRIPOD statement was found to be low (median adherence of 59%). This inadequate reporting may negatively impact the effective use of the models in clinical practice.
Collapse
Affiliation(s)
- Daniel Feller
- Department of Rehabilitation, Provincial Agency for Health of the Autonomous Province of Trento, Trento, Italy; Department of Human Resources, Provincial Agency for Health of the Autonomous Province of Trento, Trento, Italy; Department of General Practice, Erasmus MC, University Medical Center, Rotterdam, the Netherlands.
| | - Roel Wingbermuhle
- Department of General Practice, Erasmus MC, University Medical Center, Rotterdam, the Netherlands; Department of Physiotherapy and Rehabilitation sciences, SOMT University of Physiotherapy, Amersfoort, the Netherlands
| | - Bjørnar Berg
- Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
| | - Ørjan Nesse Vigdal
- Department of Rehabilitation Science and Health Technology, Faculty of Health Science, OsloMet - Oslo Metropolitan University, Oslo, Norway
| | - Tiziano Innocenti
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, the Netherlands; GIMBE Foundation, Bologna, Italy
| | - Margreth Grotle
- Centre for Intelligent Musculoskeletal Health, Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway; Division of Clinical Neuroscience, Department of Research and Innovation, Oslo University Hospital, Oslo, Norway
| | - Raymond Ostelo
- Department of Health Sciences, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, the Netherlands; Department of Epidemiology and Data Science, Amsterdam UMC, Vrije Universiteit & Amsterdam Movement Sciences, Musculoskeletal Health, Amsterdam, the Netherlands
| | - Alessandro Chiarotto
- Department of General Practice, Erasmus MC, University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
4
|
Cai YQ, Gong DX, Tang LY, Cai Y, Li HJ, Jing TC, Gong M, Hu W, Zhang ZW, Zhang X, Zhang GW. Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions. J Med Internet Res 2024; 26:e47645. [PMID: 38869157 PMCID: PMC11316160 DOI: 10.2196/47645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/30/2023] [Accepted: 06/12/2024] [Indexed: 06/14/2024] Open
Abstract
In recent years, there has been explosive development in artificial intelligence (AI), which has been widely applied in the health care field. As a typical AI technology, machine learning models have emerged with great potential in predicting cardiovascular diseases by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of cardiovascular diseases. Although the field has become a research hot spot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, and reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of a comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, data set characteristics, model design, and statistical methods, as well as clinical implications, and provide possible solutions to these problems, such as gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, using specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, and enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.
Collapse
Affiliation(s)
- Yu-Qing Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Da-Xin Gong
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | - Li-Ying Tang
- The First Hospital of China Medical University, Shenyang, China
| | - Yue Cai
- The First Hospital of China Medical University, Shenyang, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co, Ltd, Shenyang, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| | | | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, China
| | - Zhen-Wei Zhang
- China Rongtong Medical & Healthcare Co, Ltd, Chengdu, China
| | - Xingang Zhang
- Department of Cardiology, The First Hospital of China Medical University, Shenyang, China
| | - Guang-Wei Zhang
- Smart Hospital Management Department, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
5
|
Zheng C, Yue P, Cao K, Wang Y, Zhang C, Zhong J, Xu X, Lin C, Liu Q, Zou Y, Huang B. Predicting intraoperative blood loss during cesarean sections based on multi-modal information: a two-center study. Abdom Radiol (NY) 2024; 49:2325-2339. [PMID: 38896245 DOI: 10.1007/s00261-024-04419-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/27/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024]
Abstract
PURPOSE To develop and validate a nomogram model that combines radiomics features, clinical factors, and coagulation function indexes (CFI) to predict intraoperative blood loss (IBL) during cesarean sections, and to explore its application in optimizing perioperative management and reducing maternal morbidity. METHODS In this retrospective consecutive series study, a total of 346 patients who underwent magnetic resonance imaging (156 for training and 68 for internal test, center 1; 122 for external test, center 2) were included. IBL+ was defined as more than 1000 mL estimated blood loss during cesarean sections. The prediction models of IBL were developed based on machine-learning algorithms using CFI, radiomics features, and clinical factors. ROC analysis was performed to evaluate the performance for IBL diagnosis. RESULTS The support vector machine model incorporating all three modalities achieved an AUC of 0.873 (95% CI 0.769-0.941) and a sensitivity of 1.000 (95% CI 0.846-1.000) in the internal test set, with an AUC of 0.806 (95% CI 0.725-0.872) and a sensitivity of 0.873 (95% CI 0.799-0.922) in the external test set. It was also scored significantly higher than the CFI model (P = 0.035) on the internal test set, and both the CFI (P = 0.002) and radiomics-CFI models (P = 0.007) on the external test set. Additionally, the nomogram constructed based on three modalities achieved an internal testing set AUC of 0.960 (95% CI 0.806-0.999) and an external testing set AUC of 0.869 (95% CI 0.684-0.967) in the pregnant population without a pernicious placenta previa. It is noteworthy that the AUC of the proposed model did not show a statistically significant improvement compared to the Clinical-CFI model in both internal (P = 0.115) and external test sets (P = 0.533). CONCLUSION The proposed model demonstrated good performance in predicting intraoperative blood loss (IBL), exhibiting high sensitivity and robust generalizability, with potential applicability to other surgeries such as vaginal delivery and postpartum hysterectomy. However, the performance of the proposed model was not statistically significantly better than that of the Clinical-CFI model.
Collapse
Affiliation(s)
- Changye Zheng
- Department of Radiology, The Tenth Affiliated Hospital of Southern Medical University (Dongguan People's Hospital), Dongguan, Guangdong, China
| | - Peiyan Yue
- Medical AI Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Kangyang Cao
- Medical AI Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Ya Wang
- Dongguan Maternal and Child Health Care Hospital, Dongguan, Guangdong, China
| | - Chang Zhang
- Medical AI Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Jian Zhong
- Medical AI Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Xiaoyang Xu
- Department of Radiology, The Tenth Affiliated Hospital of Southern Medical University (Dongguan People's Hospital), Dongguan, Guangdong, China
| | - Chuxuan Lin
- Medical AI Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China
| | - Qinghua Liu
- Dongguan Maternal and Child Health Care Hospital, Dongguan, Guangdong, China
| | - Yujian Zou
- Department of Radiology, The Tenth Affiliated Hospital of Southern Medical University (Dongguan People's Hospital), Dongguan, Guangdong, China.
| | - Bingsheng Huang
- Medical AI Lab, School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China.
| |
Collapse
|
6
|
Lyman GH, Kuderer NM. Artificial Intelligence in Cancer Clinical Research: II. Development and Validation of Clinical Prediction Models. Cancer Invest 2024; 42:447-451. [PMID: 38775011 DOI: 10.1080/07357907.2024.2354991] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Affiliation(s)
- Gary H Lyman
- Editor-in-Chief, Cancer Investigation Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Nicole M Kuderer
- Deputy Editor, Cancer Investigation Advanced Cancer Research Group, Kirkland, WA, USA
| |
Collapse
|
7
|
Khalid SI, Massaad E, Roy JM, Thomson K, Mirpuri P, Kiapour A, Shin JH. An Appraisal of the Quality of Development and Reporting of Predictive Models in Neurosurgery: A Systematic Review. Neurosurgery 2024:00006123-990000000-01255. [PMID: 38940578 DOI: 10.1227/neu.0000000000003074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 05/10/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND AND OBJECTIVES Significant evidence has indicated that the reporting quality of novel predictive models is poor because of confounding by small data sets, inappropriate statistical analyses, and a lack of validation and reproducibility. The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement was developed to increase the generalizability of predictive models. This study evaluated the quality of predictive models reported in neurosurgical literature through their compliance with the TRIPOD guidelines. METHODS Articles reporting prediction models published in the top 5 neurosurgery journals by SCImago Journal Rank-2 (Neurosurgery, Journal of Neurosurgery, Journal of Neurosurgery: Spine, Journal of NeuroInterventional Surgery, and Journal of Neurology, Neurosurgery, and Psychiatry) between January 1st, 2018, and January 1st, 2023, were identified through a PubMed search strategy that combined terms related to machine learning and prediction modeling. These original research articles were analyzed against the TRIPOD criteria. RESULTS A total of 110 articles were assessed with the TRIPOD checklist. The median compliance was 57.4% (IQR: 50.0%-66.7%). Models using machine learning-based models exhibited lower compliance on average compared with conventional learning-based models (57.1%, 50.0%-66.7% vs 68.1%, 50.2%-68.1%, P = .472). Among the TRIPOD criteria, the lowest compliance was observed in blinding the assessment of predictors and outcomes (n = 7, 12.7% and n = 10, 16.9%, respectively), including an informative title (n = 17, 15.6%) and reporting model performance measures such as confidence intervals (n = 27, 24.8%). Few studies provided sufficient information to allow for the external validation of results (n = 26, 25.7%). CONCLUSION Published predictive models in neurosurgery commonly fall short of meeting the established guidelines laid out by TRIPOD for optimal development, validation, and reporting. This lack of compliance may represent the minor extent to which these models have been subjected to external validation or adopted into routine clinical practice in neurosurgery.
Collapse
Affiliation(s)
- Syed I Khalid
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts, USA
- Department of Neurosurgery, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Elie Massaad
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Joanna Mary Roy
- Department of Neurosurgery, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Kyle Thomson
- Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, Illinois, USA
| | - Pranav Mirpuri
- Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, Illinois, USA
| | - Ali Kiapour
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - John H Shin
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
8
|
Chen L, Shao X, Yu P. Machine learning prediction models for diabetic kidney disease: systematic review and meta-analysis. Endocrine 2024; 84:890-902. [PMID: 38141061 DOI: 10.1007/s12020-023-03637-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/28/2023] [Indexed: 12/24/2023]
Abstract
BACKGROUND Machine learning is increasingly recognized as a viable approach for identifying risk factors associated with diabetic kidney disease (DKD). However, the current state of real-world research lacks a comprehensive systematic analysis of the predictive performance of machine learning (ML) models for DKD. OBJECTIVES The objectives of this study were to systematically summarize the predictive capabilities of various ML methods in forecasting the onset and the advancement of DKD, and to provide a basic outline for ML methods in DKD. METHODS We have searched mainstream databases, including PubMed, Web of Science, Embase, and MEDLINE databases to obtain the eligible studies. Subsequently, we categorized various ML techniques and analyzed the differences in their performance in predicting DKD. RESULTS Logistic regression (LR) was the prevailing ML method, yielding an overall pooled area under the receiver operating characteristic curve (AUROC) of 0.83. On the other hand, the non-LR models also performed well with an overall pooled AUROC of 0.80. Our t-tests showed no statistically significant difference in predicting ability between LR and non-LR models (t = 1.6767, p > 0.05). CONCLUSION All ML predicting models yielded relatively satisfied DKD predicting ability with their AUROCs greater than 0.7. However, we found no evidence that non-LR models outperformed the LR model. LR exhibits high performance or accuracy in practice, while it is known for algorithmic simplicity and computational efficiency compared to others. Thus, LR may be considered a cost-effective ML model in practice.
Collapse
Affiliation(s)
- Lianqin Chen
- NHC Key Laboratory of Hormones and Development, Tianjin Key Laboratory of Metabolic Diseases, Chu Hsien-I Memorial Hospital & Tianjin Institute of Endocrinology, Tianjin Medical University, Tianjin, 300134, China
| | - Xian Shao
- NHC Key Laboratory of Hormones and Development, Tianjin Key Laboratory of Metabolic Diseases, Chu Hsien-I Memorial Hospital & Tianjin Institute of Endocrinology, Tianjin Medical University, Tianjin, 300134, China
| | - Pei Yu
- NHC Key Laboratory of Hormones and Development, Tianjin Key Laboratory of Metabolic Diseases, Chu Hsien-I Memorial Hospital & Tianjin Institute of Endocrinology, Tianjin Medical University, Tianjin, 300134, China.
| |
Collapse
|
9
|
Andaur Navarro CL, Damen JAA, Ghannad M, Dhiman P, van Smeden M, Reitsma JB, Collins GS, Riley RD, Moons KGM, Hooft L. SPIN-PM: a consensus framework to evaluate the presence of spin in studies on prediction models. J Clin Epidemiol 2024; 170:111364. [PMID: 38631529 DOI: 10.1016/j.jclinepi.2024.111364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/01/2024] [Accepted: 04/08/2024] [Indexed: 04/19/2024]
Abstract
OBJECTIVES To develop a framework to identify and evaluate spin practices and its facilitators in studies on clinical prediction model regardless of the modeling technique. STUDY DESIGN AND SETTING We followed a three-phase consensus process: (1) premeeting literature review to generate items to be included; (2) a series of structured meetings to provide comments discussed and exchanged viewpoints on items to be included with a panel of experienced researchers; and (3) postmeeting review on final list of items and examples to be included. Through this iterative consensus process, a framework was derived after all panel's researchers agreed. RESULTS This consensus process involved a panel of eight researchers and resulted in SPIN-Prediction Models which consists of two categories of spin (misleading interpretation and misleading transportability), and within these categories, two forms of spin (spin practices and facilitators of spin). We provide criteria and examples. CONCLUSION We proposed this guidance aiming to facilitate not only the accurate reporting but also an accurate interpretation and extrapolation of clinical prediction models which will likely improve the reporting quality of subsequent research, as well as reduce research waste.
Collapse
Affiliation(s)
- Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Mona Ghannad
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
10
|
Kuziemsky CE, Chrimes D, Minshall S, Mannerow M, Lau F. AI Quality Standards in Health Care: Rapid Umbrella Review. J Med Internet Res 2024; 26:e54705. [PMID: 38776538 PMCID: PMC11153979 DOI: 10.2196/54705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 05/25/2024] Open
Abstract
BACKGROUND In recent years, there has been an upwelling of artificial intelligence (AI) studies in the health care literature. During this period, there has been an increasing number of proposed standards to evaluate the quality of health care AI studies. OBJECTIVE This rapid umbrella review examines the use of AI quality standards in a sample of health care AI systematic review articles published over a 36-month period. METHODS We used a modified version of the Joanna Briggs Institute umbrella review method. Our rapid approach was informed by the practical guide by Tricco and colleagues for conducting rapid reviews. Our search was focused on the MEDLINE database supplemented with Google Scholar. The inclusion criteria were English-language systematic reviews regardless of review type, with mention of AI and health in the abstract, published during a 36-month period. For the synthesis, we summarized the AI quality standards used and issues noted in these reviews drawing on a set of published health care AI standards, harmonized the terms used, and offered guidance to improve the quality of future health care AI studies. RESULTS We selected 33 review articles published between 2020 and 2022 in our synthesis. The reviews covered a wide range of objectives, topics, settings, designs, and results. Over 60 AI approaches across different domains were identified with varying levels of detail spanning different AI life cycle stages, making comparisons difficult. Health care AI quality standards were applied in only 39% (13/33) of the reviews and in 14% (25/178) of the original studies from the reviews examined, mostly to appraise their methodological or reporting quality. Only a handful mentioned the transparency, explainability, trustworthiness, ethics, and privacy aspects. A total of 23 AI quality standard-related issues were identified in the reviews. There was a recognized need to standardize the planning, conduct, and reporting of health care AI studies and address their broader societal, ethical, and regulatory implications. CONCLUSIONS Despite the growing number of AI standards to assess the quality of health care AI studies, they are seldom applied in practice. With increasing desire to adopt AI in different health topics, domains, and settings, practitioners and researchers must stay abreast of and adapt to the evolving landscape of health care AI quality standards and apply these standards to improve the quality of their AI studies.
Collapse
Affiliation(s)
| | - Dillon Chrimes
- School of Health Information Science, University of Victoria, Victoria, BC, Canada
| | - Simon Minshall
- School of Health Information Science, University of Victoria, Victoria, BC, Canada
| | | | - Francis Lau
- School of Health Information Science, University of Victoria, Victoria, BC, Canada
| |
Collapse
|
11
|
Kapoor S, Cantrell EM, Peng K, Pham TH, Bail CA, Gundersen OE, Hofman JM, Hullman J, Lones MA, Malik MM, Nanayakkara P, Poldrack RA, Raji ID, Roberts M, Salganik MJ, Serra-Garcia M, Stewart BM, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. SCIENCE ADVANCES 2024; 10:eadk3452. [PMID: 38691601 PMCID: PMC11092361 DOI: 10.1126/sciadv.adk3452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 05/03/2024]
Abstract
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
Collapse
Affiliation(s)
- Sayash Kapoor
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Emily M. Cantrell
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ 08544, USA
| | - Kenny Peng
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| | - Thanh Hien Pham
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Christopher A. Bail
- Department of Sociology, Duke University, Durham, NC 27708, USA
- Department of Political Science, Duke University, Durham, NC 27708, USA
- Sanford School of Public Policy, Duke University, Durham, NC 27708, USA
| | - Odd Erik Gundersen
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Aneo AS, Trondheim, Norway
| | | | - Jessica Hullman
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Michael A. Lones
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
| | - Momin M. Malik
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute in Critical Quantitative, Computational, & Mixed Methodologies, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Priyanka Nanayakkara
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
- Department of Communication Studies, Northwestern University, Evanston, IL 60208, USA
| | | | - Inioluwa Deborah Raji
- Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Matthew J. Salganik
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
| | - Marta Serra-Garcia
- Rady School of Management, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brandon M. Stewart
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
- Department of Politics, Princeton University, Princeton, NJ 08544, USA
| | - Gilles Vandewiele
- Department of Information Technology, Ghent University, Ghent, Belgium
| | - Arvind Narayanan
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
12
|
Dhiman P, Ma J, Kirtley S, Mouka E, Waldron CM, Whittle R, Collins GS. Prediction model protocols indicate better adherence to recommended guidelines for study conduct and reporting. J Clin Epidemiol 2024; 169:111287. [PMID: 38387617 DOI: 10.1016/j.jclinepi.2024.111287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 02/07/2024] [Accepted: 02/15/2024] [Indexed: 02/24/2024]
Abstract
BACKGROUND AND OBJECTIVE Protocols are invaluable documents for any research study, especially for prediction model studies. However, the mere existence of a protocol is insufficient if key details are omitted. We reviewed the reporting content and details of the proposed design and methods reported in published protocols for prediction model research. METHODS We searched MEDLINE, Embase, and the Web of Science Core Collection for protocols for studies developing or validating a diagnostic or prognostic model using any modeling approach in any clinical area. We screened protocols published between Jan 1, 2022 and June 30, 2022. We used the abstract, introduction, methods, and discussion sections of The Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis (TRIPOD) statement to inform data extraction. RESULTS We identified 30 protocols, of which 28 were describing plans for model development and six for model validation. All protocols were open access, including a preprint. 15 protocols reported prospectively collecting data. 21 protocols planned to use clustered data, of which one-third planned methods to account for it. A planned sample size was reported for 93% development and 67% validation analyses. 16 protocols reported details of study registration, but all protocols reported a statement on ethics approval. Plans for data sharing were reported in 13 protocols. CONCLUSION Protocols for prediction model studies are uncommon, and few are made publicly available. Those that are available were reasonably well-reported and often described their methods following current prediction model research recommendations, likely leading to better reporting and methods in the actual study.
Collapse
Affiliation(s)
- Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK.
| | - Jie Ma
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Shona Kirtley
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Elizabeth Mouka
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Caitlin M Waldron
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK; NIHR Blood and Transplant Research Unit in Data Driven Transfusion Practice, Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
13
|
Ingwersen EW, Daams F. Response letter to the editor-Original manuscript: Machine learning versus logistic regression for the prediction of complication after pancreatoduodenectomy. Surgery 2024; 175:1467. [PMID: 38326218 DOI: 10.1016/j.surg.2023.12.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 12/26/2023] [Indexed: 02/09/2024]
Affiliation(s)
- Erik W Ingwersen
- Department of Surgery, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Cancer Center Amsterdam, Amsterdam, the Netherlands; Amsterdam Gastroenterology Endocrinology and Metabolism, Amsterdam, the Netherlands.
| | - F Daams
- Department of Surgery, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands; Cancer Center Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
14
|
Cohen JF, Bossuyt PMM. TRIPOD+AI: an updated reporting guideline for clinical prediction models. BMJ 2024; 385:q824. [PMID: 38626949 DOI: 10.1136/bmj.q824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Affiliation(s)
- Jérémie F Cohen
- Centre of Research in Epidemiology and Statistics (CRESS), INSERM, EPOPé Research Team, Université Paris Cité, 75014 Paris, France
- Department of General Pediatrics and Pediatric Infectious Diseases, Necker-Enfants Malades Hospital, Assistance Publique-Hôpitaux de Paris, Université Paris Cité, Paris, France
| | - Patrick M M Bossuyt
- Department of Epidemiology and Data Science, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
15
|
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024; 385:e078378. [PMID: 38626948 PMCID: PMC11019967 DOI: 10.1136/bmj-2023-078378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 04/19/2024]
Affiliation(s)
- Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Karel G M Moons
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Paula Dhiman
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
| | - Andrew L Beam
- Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Science, Leiden University Medical Centre, Leiden, Netherlands
| | - Marzyeh Ghassemi
- Department of Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Xiaoxuan Liu
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Johannes B Reitsma
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Maarten van Smeden
- Julius Centre for Health Sciences and Primary Care, University Medical Centre Utrecht, Utrecht University, Utrecht, Netherlands
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University of Munich and Munich Centre of Machine Learning, Germany
| | - Jennifer Catherine Camaradou
- Patient representative, Health Data Research UK patient and public involvement and engagement group
- Patient representative, University of East Anglia, Faculty of Health Sciences, Norwich Research Park, Norwich, UK
| | - Leo Anthony Celi
- Beth Israel Deaconess Medical Center, Boston, MA, USA
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK
- British Heart Foundation Data Science Centre, London, UK
| | - Alastair K Denniston
- National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
- Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Ben Glocker
- Department of Computing, Imperial College London, London, UK
| | - Robert M Golub
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | - Georg Heinze
- Section for Clinical Biometrics, Centre for Medical Data Science, Medical University of Vienna, Vienna, Austria
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | | | - Emily Lam
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Naomi Lee
- National Institute for Health and Care Excellence, London, UK
| | - Elizabeth W Loder
- The BMJ, London, UK
- Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lena Maier-Hein
- Department of Intelligent Medical Systems, German Cancer Research Centre, Heidelberg, Germany
| | - Bilal A Mateen
- Institute of Health Informatics, University College London, London, UK
- Wellcome Trust, London, UK
- Alan Turing Institute, London, UK
| | - Melissa D McCradden
- Department of Bioethics, Hospital for Sick Children Toronto, ON, Canada
- Genetics and Genome Biology, SickKids Research Institute, Toronto, ON, Canada
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Johan Ordish
- Medicines and Healthcare products Regulatory Agency, London, UK
| | - Richard Parnell
- Patient representative, Health Data Research UK patient and public involvement and engagement group
| | - Sherri Rose
- Department of Health Policy and Center for Health Policy, Stanford University, Stanford, CA, USA
| | - Karandeep Singh
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Laure Wynants
- Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, Netherlands
| | - Patricia Logullo
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| |
Collapse
|
16
|
Dos Santos AL, Pinhati C, Perdigão J, Galante S, Silva L, Veloso I, Simões E Silva AC, Oliveira EA. Machine learning algorithms to predict outcomes in children and adolescents with COVID-19: A systematic review. Artif Intell Med 2024; 150:102824. [PMID: 38553164 DOI: 10.1016/j.artmed.2024.102824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/10/2023] [Accepted: 02/21/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND AND OBJECTIVES We aimed to analyze the study designs, modeling approaches, and performance evaluation metrics in studies using machine learning techniques to develop clinical prediction models for children and adolescents with COVID-19. METHODS We searched four databases for articles published between 01/01/2020 and 10/25/2023, describing the development of multivariable prediction models using any machine learning technique for predicting several outcomes in children and adolescents who had COVID-19. RESULTS We included ten articles, six (60 % [95 % confidence interval (CI) 0.31 - 0.83]) were predictive diagnostic models and four (40% [95 % CI 0.170.69]) were prognostic models. All models were developed to predict a binary outcome (n= 10/10, 100 % [95 % CI 0.72-1]). The most frequently predicted outcome was disease detection (n=3/10, 30% [95 % CI 0.11-0.60]). The most commonly used machine learning models in the studies were tree-based (n=12/33, 36.3% [95 % CI 0.17-0.47]) and neural networks (n=9/27, 33.2% [95% CI 0.15-0.44]). CONCLUSION Our review revealed that attention is required to address problems including small sample sizes, inconsistent reporting practices on data preparation, biases in data sources, lack of reporting metrics such as calibration and discrimination, hyperparameters and other aspects that allow reproducibility by other researchers and might improve the methodology.
Collapse
Affiliation(s)
- Adriano Lages Dos Santos
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil; Federal Institute of Education, Science and Technology of Minas Gerais (IFMG), Belo Horizonte, Brazil.
| | - Clara Pinhati
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Jonathan Perdigão
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Stella Galante
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Ludmilla Silva
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Isadora Veloso
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Ana Cristina Simões E Silva
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Eduardo Araújo Oliveira
- Department of Pediatrics, Health Sciences Postgraduate Program, School of Medicine, Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil
| |
Collapse
|
17
|
Miché M, Strippoli MPF, Preisig M, Lieb R. Evaluating the clinical utility of an easily applicable prediction model of suicide attempts, newly developed and validated with a general community sample of adults. BMC Psychiatry 2024; 24:217. [PMID: 38509477 PMCID: PMC10953234 DOI: 10.1186/s12888-024-05647-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND A suicide attempt (SA) is a clinically serious action. Researchers have argued that reducing long-term SA risk may be possible, provided that at-risk individuals are identified and receive adequate treatment. Algorithms may accurately identify at-risk individuals. However, the clinical utility of algorithmically estimated long-term SA risk has never been the predominant focus of any study. METHODS The data of this report stem from CoLaus|PsyCoLaus, a prospective longitudinal study of general community adults from Lausanne, Switzerland. Participants (N = 4,097; Mage = 54 years, range: 36-86; 54% female) were assessed up to four times, starting in 2003, approximately every 4-5 years. Long-term individual SA risk was prospectively predicted, using logistic regression. This algorithm's clinical utility was assessed by net benefit (NB). Clinical utility expresses a tool's benefit after having taken this tool's potential harm into account. Net benefit is obtained, first, by weighing the false positives, e.g., 400 individuals, at the risk threshold, e.g., 1%, using its odds (odds of 1% yields 1/(100-1) = 1/99), then by subtracting the result (400*1/99 = 4.04) from the true positives, e.g., 5 individuals (5-4.04), and by dividing the result (0.96) by the sample size, e.g., 800 (0.96/800). All results are based on 100 internal cross-validations. The predictors used in this study were: lifetime SA, any lifetime mental disorder, sex, and age. RESULTS SA at any of the three follow-up study assessments was reported by 1.2%. For a range of seven a priori selected threshold probabilities, ranging between 0.5% and 2%, logistic regression showed highest overall NB in 97.4% of all 700 internal cross-validations (100 for each selected threshold probability). CONCLUSION Despite the strong class imbalance of the outcome (98.8% no, 1.2% yes) and only four predictors, clinical utility was observed. That is, using the logistic regression model for clinical decision making provided the most true positives, without an increase of false positives, compared to all competing decision strategies. Clinical utility is one among several important prerequisites of implementing an algorithm in routine practice, and may possibly guide a clinicians' treatment decision making to reduce long-term individual SA risk. The novel metric NB may become a standard performance measure, because the a priori invested clinical considerations enable clinicians to interpret the results directly.
Collapse
Affiliation(s)
- Marcel Miché
- Department of Psychology, Division of Clinical Psychology and Epidemiology, University of Basel, Missionsstrasse 60-62, 4055, Basel, Switzerland.
| | - Marie-Pierre F Strippoli
- Psychiatric Epidemiology and Psychopathology Research Center, Lausanne University Hospital, University of Lausanne, Prilly, Switzerland
| | - Martin Preisig
- Psychiatric Epidemiology and Psychopathology Research Center, Lausanne University Hospital, University of Lausanne, Prilly, Switzerland
| | - Roselind Lieb
- Department of Psychology, Division of Clinical Psychology and Epidemiology, University of Basel, Missionsstrasse 60-62, 4055, Basel, Switzerland
| |
Collapse
|
18
|
Talimtzi P, Ntolkeras A, Kostopoulos G, Bougioukas KI, Pagkalidou E, Ouranidis A, Pataka A, Haidich AB. The reporting completeness and transparency of systematic reviews of prognostic prediction models for COVID-19 was poor: a methodological overview of systematic reviews. J Clin Epidemiol 2024; 167:111264. [PMID: 38266742 DOI: 10.1016/j.jclinepi.2024.111264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 01/08/2024] [Accepted: 01/13/2024] [Indexed: 01/26/2024]
Abstract
OBJECTIVES To conduct a methodological overview of reviews to evaluate the reporting completeness and transparency of systematic reviews (SRs) of prognostic prediction models (PPMs) for COVID-19. STUDY DESIGN AND SETTING MEDLINE, Scopus, Cochrane Database of Systematic Reviews, and Epistemonikos (epistemonikos.org) were searched for SRs of PPMs for COVID-19 until December 31, 2022. The risk of bias in systematic reviews tool was used to assess the risk of bias. The protocol for this overview was uploaded in the Open Science Framework (https://osf.io/7y94c). RESULTS Ten SRs were retrieved; none of them synthesized the results in a meta-analysis. For most of the studies, there was absence of a predefined protocol and missing information on study selection, data collection process, and reporting of primary studies and models included, while only one SR had its data publicly available. In addition, for the majority of the SRs, the overall risk of bias was judged as being high. The overall corrected covered area was 6.3% showing a small amount of overlapping among the SRs. CONCLUSION The reporting completeness and transparency of SRs of PPMs for COVID-19 was poor. Guidance is urgently required, with increased awareness and education of minimum reporting standards and quality criteria. Specific focus is needed in predefined protocol, information on study selection and data collection process, and in the reporting of findings to improve the quality of SRs of PPMs for COVID-19.
Collapse
Affiliation(s)
- Persefoni Talimtzi
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Antonios Ntolkeras
- School of Biology, Aristotle University of Thessaloniki, University Campus, 54636, Thessaloniki, Greece
| | | | - Konstantinos I Bougioukas
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Eirini Pagkalidou
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Andreas Ouranidis
- Department of Pharmaceutical Technology, School of Pharmacy, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
| | - Athanasia Pataka
- Department of Respiratory Deficiency, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece
| | - Anna-Bettina Haidich
- Department of Hygiene, Social-Preventive Medicine and Medical Statistics, School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, University Campus, 54124, Thessaloniki, Greece.
| |
Collapse
|
19
|
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, Li HJ, Li-Ling J, Hu W, Yin Z, Gong DX, Zhang GW. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Med 2024; 22:56. [PMID: 38317226 PMCID: PMC10845808 DOI: 10.1186/s12916-024-03273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND A comprehensive overview of artificial intelligence (AI) for cardiovascular disease (CVD) prediction and a screening tool of AI models (AI-Ms) for independent external validation are lacking. This systematic review aims to identify, describe, and appraise AI-Ms of CVD prediction in the general and special populations and develop a new independent validation score (IVS) for AI-Ms replicability evaluation. METHODS PubMed, Web of Science, Embase, and IEEE library were searched up to July 2021. Data extraction and analysis were performed for the populations, distribution, predictors, algorithms, etc. The risk of bias was evaluated with the prediction risk of bias assessment tool (PROBAST). Subsequently, we designed IVS for model replicability evaluation with five steps in five items, including transparency of algorithms, performance of models, feasibility of reproduction, risk of reproduction, and clinical implication, respectively. The review is registered in PROSPERO (No. CRD42021271789). RESULTS In 20,887 screened references, 79 articles (82.5% in 2017-2021) were included, which contained 114 datasets (67 in Europe and North America, but 0 in Africa). We identified 486 AI-Ms, of which the majority were in development (n = 380), but none of them had undergone independent external validation. A total of 66 idiographic algorithms were found; however, 36.4% were used only once and only 39.4% over three times. A large number of different predictors (range 5-52,000, median 21) and large-span sample size (range 80-3,660,000, median 4466) were observed. All models were at high risk of bias according to PROBAST, primarily due to the incorrect use of statistical methods. IVS analysis confirmed only 10 models as "recommended"; however, 281 and 187 were "not recommended" and "warning," respectively. CONCLUSION AI has led the digital revolution in the field of CVD prediction, but is still in the early stage of development as the defects of research design, report, and evaluation systems. The IVS we developed may contribute to independent external validation and the development of this field.
Collapse
Affiliation(s)
- Yue Cai
- China Medical University, Shenyang, 110122, China
| | - Yu-Qing Cai
- China Medical University, Shenyang, 110122, China
| | - Li-Ying Tang
- China Medical University, Shenyang, 110122, China
| | - Yi-Han Wang
- China Medical University, Shenyang, 110122, China
| | - Mengchun Gong
- Digital Health China Co. Ltd, Beijing, 100089, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co. Ltd., Shenyang, 110001, China
- Enduring Medicine Smart Innovation Research Institute, Shenyang, 110001, China
| | - Jesse Li-Ling
- Institute of Genetic Medicine, School of Life Science, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610065, China
| | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, 610017, China
| | - Zhihua Yin
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, 110122, China.
| | - Da-Xin Gong
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| | - Guang-Wei Zhang
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| |
Collapse
|
20
|
Zrubka Z, Kertész G, Gulácsi L, Czere J, Hölgyesi Á, Nezhad HM, Mosavi A, Kovács L, Butte AJ, Péntek M. The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review. J Med Internet Res 2024; 26:e47430. [PMID: 38241075 PMCID: PMC10837761 DOI: 10.2196/47430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/29/2023] [Accepted: 11/17/2023] [Indexed: 01/23/2024] Open
Abstract
BACKGROUND Diabetes mellitus (DM) is a major health concern among children with the widespread adoption of advanced technologies. However, concerns are growing about the transparency, replicability, biasedness, and overall validity of artificial intelligence studies in medicine. OBJECTIVE We aimed to systematically review the reporting quality of machine learning (ML) studies of pediatric DM using the Minimum Information About Clinical Artificial Intelligence Modelling (MI-CLAIM) checklist, a general reporting guideline for medical artificial intelligence studies. METHODS We searched the PubMed and Web of Science databases from 2016 to 2020. Studies were included if the use of ML was reported in children with DM aged 2 to 18 years, including studies on complications, screening studies, and in silico samples. In studies following the ML workflow of training, validation, and testing of results, reporting quality was assessed via MI-CLAIM by consensus judgments of independent reviewer pairs. Positive answers to the 17 binary items regarding sufficient reporting were qualitatively summarized and counted as a proxy measure of reporting quality. The synthesis of results included testing the association of reporting quality with publication and data type, participants (human or in silico), research goals, level of code sharing, and the scientific field of publication (medical or engineering), as well as with expert judgments of clinical impact and reproducibility. RESULTS After screening 1043 records, 28 studies were included. The sample size of the training cohort ranged from 5 to 561. Six studies featured only in silico patients. The reporting quality was low, with great variation among the 21 studies assessed using MI-CLAIM. The number of items with sufficient reporting ranged from 4 to 12 (mean 7.43, SD 2.62). The items on research questions and data characterization were reported adequately most often, whereas items on patient characteristics and model examination were reported adequately least often. The representativeness of the training and test cohorts to real-world settings and the adequacy of model performance evaluation were the most difficult to judge. Reporting quality improved over time (r=0.50; P=.02); it was higher than average in prognostic biomarker and risk factor studies (P=.04) and lower in noninvasive hypoglycemia detection studies (P=.006), higher in studies published in medical versus engineering journals (P=.004), and higher in studies sharing any code of the ML pipeline versus not sharing (P=.003). The association between expert judgments and MI-CLAIM ratings was not significant. CONCLUSIONS The reporting quality of ML studies in the pediatric population with DM was generally low. Important details for clinicians, such as patient characteristics; comparison with the state-of-the-art solution; and model examination for valid, unbiased, and robust results, were often the weak points of reporting. To assess their clinical utility, the reporting standards of ML studies must evolve, and algorithms for this challenging population must become more transparent and replicable.
Collapse
Affiliation(s)
- Zsombor Zrubka
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - Gábor Kertész
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| | - László Gulácsi
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - János Czere
- Doctoral School of Innovation Management, Óbuda University, Budapest, Hungary
| | - Áron Hölgyesi
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
- Doctoral School of Molecular Medicine, Semmelweis University, Budapest, Hungary
| | - Hossein Motahari Nezhad
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
- Doctoral School of Business and Management, Corvinus University of Budapest, Budapest, Hungary
| | - Amir Mosavi
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| | - Levente Kovács
- Physiological Controls Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, United States
| | - Márta Péntek
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| |
Collapse
|
21
|
Ying TT, Zhuang LY, Xu SH, Zhang SF, Huang LJ, Gao WW, Liu L, Lai QL, Lou Y, Liu XL. Identification of Dementia & Mild Cognitive Impairment in Chinese Elderly Using Machine Learning. Am J Alzheimers Dis Other Demen 2024; 39:15333175241275215. [PMID: 39133478 PMCID: PMC11320688 DOI: 10.1177/15333175241275215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
OBJECTIVE To assess the role of Machine Learning (ML) in identification critical factors of dementia and mild cognitive impairment. METHODS 371 elderly individuals were ultimately included in the ML analysis. Demographic information (including gender, age, parity, visual acuity, auditory function, mobility, and medication history) and 35 features from 10 assessment scales were used for modeling. Five machine learning classifiers were used for evaluation, employing a procedure involving feature extraction, selection, model training, and performance assessment to identify key indicative factors. RESULTS The Random Forest model, after data preprocessing, Information Gain, and Meta-analysis, utilized three training features and four meta-features, achieving an area under the curve of 0.961 and a accuracy of 0.894, showcasing exceptional accuracy for the identification of dementia and mild cognitive impairment. CONCLUSIONS ML serves as a identification tool for dementia and mild cognitive impairment. Using Information Gain and Meta-feature analysis, Clinical Dementia Rating (CDR) and Neuropsychiatric Inventory (NPI) scale information emerged as crucial for training the Random Forest model.
Collapse
Affiliation(s)
- Tong-Tong Ying
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Li-Ying Zhuang
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Shan-Hu Xu
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Shu-Feng Zhang
- Second Department of Geriatrics, Weifang People’s Hospital, Weifang, China
| | - Li-Jun Huang
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Wei-Wei Gao
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Lu Liu
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Qi-Lun Lai
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Yue Lou
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| | - Xiao-Li Liu
- Department of Neurology, Zhejiang Hospital, Hangzhou, China
| |
Collapse
|
22
|
Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, Riley RD, Schlussel MM. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. J Clin Epidemiol 2024; 165:111199. [PMID: 37898461 DOI: 10.1016/j.jclinepi.2023.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
OBJECTIVE To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine learning methods in the field of oncology. STUDY DESIGN AND SETTING We conducted a systematic review, searching the MEDLINE database between December 1, 2022, and December 31, 2022, for studies developing a multivariable prognostic model using machine learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices. RESULTS We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data sharing statements, with 21 (46%) indicating data were available on request to the authors and seven declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their model to be used in practice. The use of reporting guidelines was rare: eight studies (18%) mentioning using a reporting guideline, with 4 (10%) using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis statement, 1 (2%) using Minimum Information About Clinical Artificial Intelligence Modeling and Consolidated Standards Of Reporting Trials-Artificial Intelligence, 1 (2%) using Strengthening The Reporting Of Observational Studies In Epidemiology, 1 (2%) using Standards for Reporting Diagnostic Accuracy Studies, and 1 (2%) using Transparent Reporting of Evaluations with Nonrandomized Designs. CONCLUSION The adoption of open science principles in oncology studies developing prognostic models using machine learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science are needed for prediction research in oncology.
Collapse
Affiliation(s)
- Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom.
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA; Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, United Kingdom
| | - Patricia Logullo
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Jennifer A de Beyer
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Michael M Schlussel
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
23
|
Carrasco-Ribelles LA, Llanes-Jurado J, Gallego-Moll C, Cabrera-Bean M, Monteagudo-Zaragoza M, Violán C, Zabaleta-del-Olmo E. Prediction models using artificial intelligence and longitudinal data from electronic health records: a systematic methodological review. J Am Med Inform Assoc 2023; 30:2072-2082. [PMID: 37659105 PMCID: PMC10654870 DOI: 10.1093/jamia/ocad168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 08/02/2023] [Accepted: 08/11/2023] [Indexed: 09/04/2023] Open
Abstract
OBJECTIVE To describe and appraise the use of artificial intelligence (AI) techniques that can cope with longitudinal data from electronic health records (EHRs) to predict health-related outcomes. METHODS This review included studies in any language that: EHR was at least one of the data sources, collected longitudinal data, used an AI technique capable of handling longitudinal data, and predicted any health-related outcomes. We searched MEDLINE, Scopus, Web of Science, and IEEE Xplorer from inception to January 3, 2022. Information on the dataset, prediction task, data preprocessing, feature selection, method, validation, performance, and implementation was extracted and summarized using descriptive statistics. Risk of bias and completeness of reporting were assessed using a short form of PROBAST and TRIPOD, respectively. RESULTS Eighty-one studies were included. Follow-up time and number of registers per patient varied greatly, and most predicted disease development or next event based on diagnoses and drug treatments. Architectures generally were based on Recurrent Neural Networks-like layers, though in recent years combining different layers or transformers has become more popular. About half of the included studies performed hyperparameter tuning and used attention mechanisms. Most performed a single train-test partition and could not correctly assess the variability of the model's performance. Reporting quality was poor, and a third of the studies were at high risk of bias. CONCLUSIONS AI models are increasingly using longitudinal data. However, the heterogeneity in reporting methodology and results, and the lack of public EHR datasets and code sharing, complicate the possibility of replication. REGISTRATION PROSPERO database (CRD42022331388).
Collapse
Affiliation(s)
- Lucía A Carrasco-Ribelles
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Barcelona, 08007, Spain
- Department of Signal Theory and Communications, Universitat Politècnica de Catalunya (UPC), Barcelona, 08034, Spain
- Unitat de Suport a la Recerca Metropolitana Nord, Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Mataró, 08303, Spain
| | - José Llanes-Jurado
- Instituto de Investigación e Innovación en Bioingeniería (i3B), Universitat Politècnica de València (UPV), València, 46022, Spain
| | - Carlos Gallego-Moll
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Barcelona, 08007, Spain
- Unitat de Suport a la Recerca Metropolitana Nord, Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Mataró, 08303, Spain
| | - Margarita Cabrera-Bean
- Department of Signal Theory and Communications, Universitat Politècnica de Catalunya (UPC), Barcelona, 08034, Spain
| | - Mònica Monteagudo-Zaragoza
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Barcelona, 08007, Spain
| | - Concepción Violán
- Unitat de Suport a la Recerca Metropolitana Nord, Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Mataró, 08303, Spain
- Direcció d’Atenció Primària Metropolitana Nord, Institut Català de Salut, Badalona, 08915, Spain
- Fundació Institut d’Investigació en ciències de la salut Germans Trias i Pujol (IGTP), Badalona, 08916, Spain
- Fundació UAB, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, 08193, Spain
| | - Edurne Zabaleta-del-Olmo
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Barcelona, 08007, Spain
- Gerència Territorial de Barcelona, Institut Català de la Salut, Carrer de Balmes 22, Barcelona, 08007, Spain
- Nursing Department, Faculty of Nursing, Universitat de Girona, Girona, 17003, Spain
| |
Collapse
|
24
|
Truchot A, Raynaud M, Loupy A. The authors reply. Kidney Int 2023; 104:1036. [PMID: 37863625 DOI: 10.1016/j.kint.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 07/26/2023] [Indexed: 10/22/2023]
Affiliation(s)
- Agathe Truchot
- Paris Institute for Transplantation and Organ Regeneration, Université Paris Cité, Institut National de la Santé de Et de la Recherche Médicale (INSERM), U-970, AP-HP, Paris, France
| | - Marc Raynaud
- Paris Institute for Transplantation and Organ Regeneration, Université Paris Cité, Institut National de la Santé de Et de la Recherche Médicale (INSERM), U-970, AP-HP, Paris, France
| | - Alexandre Loupy
- Paris Institute for Transplantation and Organ Regeneration, Université Paris Cité, Institut National de la Santé de Et de la Recherche Médicale (INSERM), U-970, AP-HP, Paris, France.
| |
Collapse
|
25
|
Ng FYC, Thirunavukarasu AJ, Cheng H, Tan TF, Gutierrez L, Lan Y, Ong JCL, Chong YS, Ngiam KY, Ho D, Wong TY, Kwek K, Doshi-Velez F, Lucey C, Coffman T, Ting DSW. Artificial intelligence education: An evidence-based medicine approach for consumers, translators, and developers. Cell Rep Med 2023; 4:101230. [PMID: 37852174 PMCID: PMC10591047 DOI: 10.1016/j.xcrm.2023.101230] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 09/04/2023] [Accepted: 09/15/2023] [Indexed: 10/20/2023]
Abstract
Current and future healthcare professionals are generally not trained to cope with the proliferation of artificial intelligence (AI) technology in healthcare. To design a curriculum that caters to variable baseline knowledge and skills, clinicians may be conceptualized as "consumers", "translators", or "developers". The changes required of medical education because of AI innovation are linked to those brought about by evidence-based medicine (EBM). We outline a core curriculum for AI education of future consumers, translators, and developers, emphasizing the links between AI and EBM, with suggestions for how teaching may be integrated into existing curricula. We consider the key barriers to implementation of AI in the medical curriculum: time, resources, variable interest, and knowledge retention. By improving AI literacy rates and fostering a translator- and developer-enriched workforce, innovation may be accelerated for the benefit of patients and practitioners.
Collapse
Affiliation(s)
- Faye Yu Ci Ng
- Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore, Singapore; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Arun James Thirunavukarasu
- Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore, Singapore; University of Cambridge School of Clinical Medicine, Cambridge, UK; Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, UK
| | - Haoran Cheng
- Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore, Singapore; Rollins School of Public Health, Emory University, Atlanta, GA, USA; Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Ting Fang Tan
- Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore, Singapore
| | - Laura Gutierrez
- Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore, Singapore
| | - Yanyan Lan
- Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China
| | | | - Yap Seng Chong
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Dean's Office, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Kee Yuan Ngiam
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Biomedical Engineering, School of Engineering, National University of Singapore, Singapore, Singapore
| | - Dean Ho
- Biomedical Engineering, School of Engineering, National University of Singapore, Singapore, Singapore; Insitute for Digital Medicine (WisDM), N.1 Institute for Health, National University of Singapore, Singapore, Singapore; Department of Pharmacology, National University of Singapore, Singapore, Singapore
| | - Tien Yin Wong
- Tsinghua Medicine, Tsinghua University, Beijing, China
| | - Kenneth Kwek
- Chief Executive Office, Singapore General Hospital, SingHealth, Singapore, Singapore
| | - Finale Doshi-Velez
- Harvard Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Catherine Lucey
- Executive Vice Chancellor and Provost Office, University of California, San Francisco, San Francisco, CA, USA
| | - Thomas Coffman
- Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Daniel Shu Wei Ting
- Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore, Singapore; Duke-NUS Medical School, National University of Singapore, Singapore, Singapore; Byers Eye Institute, Stanford University, Palo Alto, CA, USA.
| |
Collapse
|
26
|
Stam WT, Ingwersen EW, Ali M, Spijkerman JT, Kazemier G, Bruns ERJ, Daams F. Machine learning models in clinical practice for the prediction of postoperative complications after major abdominal surgery. Surg Today 2023; 53:1209-1215. [PMID: 36840764 PMCID: PMC10520164 DOI: 10.1007/s00595-023-02662-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 02/07/2023] [Indexed: 02/26/2023]
Abstract
Complications after surgery have a major impact on short- and long-term outcomes, and decades of technological advancement have not yet led to the eradication of their risk. The accurate prediction of complications, recently enhanced by the development of machine learning algorithms, has the potential to completely reshape surgical patient management. In this paper, we reflect on multiple issues facing the implementation of machine learning, from the development to the actual implementation of machine learning models in daily clinical practice, providing suggestions on the use of machine learning models for predicting postoperative complications after major abdominal surgery.
Collapse
Affiliation(s)
- Wessel T Stam
- Department of Surgery, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, The Netherlands
- AGEM Amsterdam Gastroenterology, Endocrinology and Metabolism, Amsterdam, The Netherlands
| | - Erik W Ingwersen
- Department of Surgery, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, The Netherlands
- AGEM Amsterdam Gastroenterology, Endocrinology and Metabolism, Amsterdam, The Netherlands
| | - Mahsoem Ali
- Department of Surgery, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, The Netherlands
| | - Jorik T Spijkerman
- Independent Consultant in Computational Intelligence, Amsterdam, The Netherlands
| | - Geert Kazemier
- Department of Surgery, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, The Netherlands
| | - Emma R J Bruns
- Department of Surgery, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, The Netherlands
| | - Freek Daams
- Department of Surgery, Amsterdam UMC Location Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands.
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, The Netherlands.
| |
Collapse
|
27
|
Cembrowska-Lech D, Krzemińska A, Miller T, Nowakowska A, Adamski C, Radaczyńska M, Mikiciuk G, Mikiciuk M. An Integrated Multi-Omics and Artificial Intelligence Framework for Advance Plant Phenotyping in Horticulture. BIOLOGY 2023; 12:1298. [PMID: 37887008 PMCID: PMC10603917 DOI: 10.3390/biology12101298] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/27/2023] [Accepted: 09/28/2023] [Indexed: 10/28/2023]
Abstract
This review discusses the transformative potential of integrating multi-omics data and artificial intelligence (AI) in advancing horticultural research, specifically plant phenotyping. The traditional methods of plant phenotyping, while valuable, are limited in their ability to capture the complexity of plant biology. The advent of (meta-)genomics, (meta-)transcriptomics, proteomics, and metabolomics has provided an opportunity for a more comprehensive analysis. AI and machine learning (ML) techniques can effectively handle the complexity and volume of multi-omics data, providing meaningful interpretations and predictions. Reflecting the multidisciplinary nature of this area of research, in this review, readers will find a collection of state-of-the-art solutions that are key to the integration of multi-omics data and AI for phenotyping experiments in horticulture, including experimental design considerations with several technical and non-technical challenges, which are discussed along with potential solutions. The future prospects of this integration include precision horticulture, predictive breeding, improved disease and stress response management, sustainable crop management, and exploration of plant biodiversity. The integration of multi-omics and AI holds immense promise for revolutionizing horticultural research and applications, heralding a new era in plant phenotyping.
Collapse
Affiliation(s)
- Danuta Cembrowska-Lech
- Department of Physiology and Biochemistry, Institute of Biology, University of Szczecin, Felczaka 3c, 71-412 Szczecin, Poland;
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland; (A.K.); (T.M.)
| | - Adrianna Krzemińska
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland; (A.K.); (T.M.)
- Institute of Biology, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland;
| | - Tymoteusz Miller
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland; (A.K.); (T.M.)
- Institute of Marine and Environmental Sciences, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland
| | - Anna Nowakowska
- Department of Physiology and Biochemistry, Institute of Biology, University of Szczecin, Felczaka 3c, 71-412 Szczecin, Poland;
| | - Cezary Adamski
- Institute of Biology, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland;
| | | | - Grzegorz Mikiciuk
- Department of Horticulture, Faculty of Environmental Management and Agriculture, West Pomeranian University of Technology in Szczecin, Słowackiego 17, 71-434 Szczecin, Poland;
| | - Małgorzata Mikiciuk
- Department of Bioengineering, Faculty of Environmental Management and Agriculture, West Pomeranian University of Technology in Szczecin, Słowackiego 17, 71-434 Szczecin, Poland;
| |
Collapse
|
28
|
Abdulazeem H, Whitelaw S, Schauberger G, Klug SJ. A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLoS One 2023; 18:e0274276. [PMID: 37682909 PMCID: PMC10491005 DOI: 10.1371/journal.pone.0274276] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open
Abstract
With the advances in technology and data science, machine learning (ML) is being rapidly adopted by the health care sector. However, there is a lack of literature addressing the health conditions targeted by the ML prediction models within primary health care (PHC) to date. To fill this gap in knowledge, we conducted a systematic review following the PRISMA guidelines to identify health conditions targeted by ML in PHC. We searched the Cochrane Library, Web of Science, PubMed, Elsevier, BioRxiv, Association of Computing Machinery (ACM), and IEEE Xplore databases for studies published from January 1990 to January 2022. We included primary studies addressing ML diagnostic or prognostic predictive models that were supplied completely or partially by real-world PHC data. Studies selection, data extraction, and risk of bias assessment using the prediction model study risk of bias assessment tool were performed by two investigators. Health conditions were categorized according to international classification of diseases (ICD-10). Extracted data were analyzed quantitatively. We identified 106 studies investigating 42 health conditions. These studies included 207 ML prediction models supplied by the PHC data of 24.2 million participants from 19 countries. We found that 92.4% of the studies were retrospective and 77.3% of the studies reported diagnostic predictive ML models. A majority (76.4%) of all the studies were for models' development without conducting external validation. Risk of bias assessment revealed that 90.8% of the studies were of high or unclear risk of bias. The most frequently reported health conditions were diabetes mellitus (19.8%) and Alzheimer's disease (11.3%). Our study provides a summary on the presently available ML prediction models within PHC. We draw the attention of digital health policy makers, ML models developer, and health care professionals for more future interdisciplinary research collaboration in this regard.
Collapse
Affiliation(s)
- Hebatullah Abdulazeem
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| | - Sera Whitelaw
- Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada
| | - Gunther Schauberger
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| | - Stefanie J. Klug
- Chair of Epidemiology, Department of Sport and Health Sciences, Technical University of Munich (TUM), Munich, Germany
| |
Collapse
|
29
|
Vasey B, Collins GS. Invited Commentary: Transparent reporting of artificial intelligence models development and evaluation in surgery: The TRIPOD and DECIDE-AI checklists. Surgery 2023; 174:727-729. [PMID: 37244769 DOI: 10.1016/j.surg.2023.04.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 04/27/2023] [Indexed: 05/29/2023]
Affiliation(s)
- Baptiste Vasey
- Nuffield Department of Surgical Sciences, University of Oxford, UK; Department of Surgery, Geneva University Hospital, Switzerland.
| | - Gary S Collins
- Centre for Statistics in Medicine, UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, UK. http://www.twitter.com/GSCollins
| |
Collapse
|
30
|
Klement W, El Emam K. Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies: Development and Validation. J Med Internet Res 2023; 25:e48763. [PMID: 37651179 PMCID: PMC10502599 DOI: 10.2196/48763] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/11/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
BACKGROUND The reporting of machine learning (ML) prognostic and diagnostic modeling studies is often inadequate, making it difficult to understand and replicate such studies. To address this issue, multiple consensus and expert reporting guidelines for ML studies have been published. However, these guidelines cover different parts of the analytics lifecycle, and individually, none of them provide a complete set of reporting requirements. OBJECTIVE We aimed to consolidate the ML reporting guidelines and checklists in the literature to provide reporting items for prognostic and diagnostic ML in in-silico and shadow mode studies. METHODS We conducted a literature search that identified 192 unique peer-reviewed English articles that provide guidance and checklists for reporting ML studies. The articles were screened by their title and abstract against a set of 9 inclusion and exclusion criteria. Articles that were filtered through had their quality evaluated by 2 raters using a 9-point checklist constructed from guideline development good practices. The average κ was 0.71 across all quality criteria. The resulting 17 high-quality source papers were defined as having a quality score equal to or higher than the median. The reporting items in these 17 articles were consolidated and screened against a set of 6 inclusion and exclusion criteria. The resulting reporting items were sent to an external group of 11 ML experts for review and updated accordingly. The updated checklist was used to assess the reporting in 6 recent modeling papers in JMIR AI. Feedback from the external review and initial validation efforts was used to improve the reporting items. RESULTS In total, 37 reporting items were identified and grouped into 5 categories based on the stage of the ML project: defining the study details, defining and collecting the data, modeling methodology, model evaluation, and explainability. None of the 17 source articles covered all the reporting items. The study details and data description reporting items were the most common in the source literature, with explainability and methodology guidance (ie, data preparation and model training) having the least coverage. For instance, a median of 75% of the data description reporting items appeared in each of the 17 high-quality source guidelines, but only a median of 33% of the data explainability reporting items appeared. The highest-quality source articles tended to have more items on reporting study details. Other categories of reporting items were not related to the source article quality. We converted the reporting items into a checklist to support more complete reporting. CONCLUSIONS Our findings supported the need for a set of consolidated reporting items, given that existing high-quality guidelines and checklists do not individually provide complete coverage. The consolidated set of reporting items is expected to improve the quality and reproducibility of ML modeling studies.
Collapse
Affiliation(s)
- William Klement
- University of Ottawa, Ottawa, ON, Canada
- CHEO Research Institute, Ottawa, ON, Canada
| | - Khaled El Emam
- University of Ottawa, Ottawa, ON, Canada
- CHEO Research Institute, Ottawa, ON, Canada
| |
Collapse
|
31
|
Twait EL, Andaur Navarro CL, Gudnason V, Hu YH, Launer LJ, Geerlings MI. Dementia prediction in the general population using clinically accessible variables: a proof-of-concept study using machine learning. The AGES-Reykjavik study. BMC Med Inform Decis Mak 2023; 23:168. [PMID: 37641038 PMCID: PMC10463542 DOI: 10.1186/s12911-023-02244-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 07/18/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Early identification of dementia is crucial for prompt intervention for high-risk individuals in the general population. External validation studies on prognostic models for dementia have highlighted the need for updated models. The use of machine learning in dementia prediction is in its infancy and may improve predictive performance. The current study aimed to explore the difference in performance of machine learning algorithms compared to traditional statistical techniques, such as logistic and Cox regression, for prediction of all-cause dementia. Our secondary aim was to assess the feasibility of only using clinically accessible predictors rather than MRI predictors. METHODS Data are from 4,793 participants in the population-based AGES-Reykjavik Study without dementia or mild cognitive impairment at baseline (mean age: 76 years, % female: 59%). Cognitive, biometric, and MRI assessments (total: 59 variables) were collected at baseline, with follow-up of incident dementia diagnoses for a maximum of 12 years. Machine learning algorithms included elastic net regression, random forest, support vector machine, and elastic net Cox regression. Traditional statistical methods for comparison were logistic and Cox regression. Model 1 was fit using all variables and model 2 was after feature selection using the Boruta package. A third model explored performance when leaving out neuroimaging markers (clinically accessible model). Ten-fold cross-validation, repeated ten times, was implemented during training. Upsampling was used to account for imbalanced data. Tuning parameters were optimized for recalibration automatically using the caret package in R. RESULTS 19% of participants developed all-cause dementia. Machine learning algorithms were comparable in performance to logistic regression in all three models. However, a slight added performance was observed in the elastic net Cox regression in the third model (c = 0.78, 95% CI: 0.78-0.78) compared to the traditional Cox regression (c = 0.75, 95% CI: 0.74-0.77). CONCLUSIONS Supervised machine learning only showed added benefit when using survival techniques. Removing MRI markers did not significantly worsen our model's performance. Further, we presented the use of a nomogram using machine learning methods, showing transportability for the use of machine learning models in clinical practice. External validation is needed to assess the use of this model in other populations. Identifying high-risk individuals will amplify prevention efforts and selection for clinical trials.
Collapse
Affiliation(s)
- Emma L Twait
- Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht and Utrecht University, Utrecht, the Netherlands
- Department of General Practice, Amsterdam UMC, location Vrije Universiteit Amsterdam, De Boelelaan 1117, Amsterdam, the Netherlands
- Amsterdam Public Health, Aging & Later life and Personalized Medicine, Amsterdam, the Netherlands
- Amsterdam Neuroscience, Neurodegeneration and Mood, Anxiety, Psychosis, Stress, and Sleep, Amsterdam, the Netherlands
| | - Constanza L Andaur Navarro
- Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht and Utrecht University, Utrecht, the Netherlands
| | - Vilmunur Gudnason
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
- The Icelandic Heart Association, Kopavogur, Iceland
| | - Yi-Han Hu
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging, Baltimore, MD, USA
| | - Lenore J Launer
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging, Baltimore, MD, USA
| | - Mirjam I Geerlings
- Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht and Utrecht University, Utrecht, the Netherlands.
- Amsterdam Public Health, Aging & Later life and Personalized Medicine, Amsterdam, the Netherlands.
- Amsterdam Neuroscience, Neurodegeneration and Mood, Anxiety, Psychosis, Stress, and Sleep, Amsterdam, the Netherlands.
- Laboratory of Epidemiology and Population Sciences, National Institute on Aging, Baltimore, MD, USA.
- Department of General Practice, Amsterdam UMC, location University of Amsterdam, Meibergdreef 9, Amsterdam, the Netherlands.
| |
Collapse
|
32
|
Dhiman P, Ma J, Qi C, Bullock G, Sergeant JC, Riley RD, Collins GS. Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review. BMC Med Res Methodol 2023; 23:188. [PMID: 37598153 PMCID: PMC10439652 DOI: 10.1186/s12874-023-02008-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/04/2023] [Indexed: 08/21/2023] Open
Abstract
BACKGROUND Having an appropriate sample size is important when developing a clinical prediction model. We aimed to review how sample size is considered in studies developing a prediction model for a binary outcome. METHODS We searched PubMed for studies published between 01/07/2020 and 30/07/2020 and reviewed the sample size calculations used to develop the prediction models. Using the available information, we calculated the minimum sample size that would be needed to estimate overall risk and minimise overfitting in each study and summarised the difference between the calculated and used sample size. RESULTS A total of 119 studies were included, of which nine studies provided sample size justification (8%). The recommended minimum sample size could be calculated for 94 studies: 73% (95% CI: 63-82%) used sample sizes lower than required to estimate overall risk and minimise overfitting including 26% studies that used sample sizes lower than required to estimate overall risk only. A similar number of studies did not meet the ≥ 10EPV criteria (75%, 95% CI: 66-84%). The median deficit of the number of events used to develop a model was 75 [IQR: 234 lower to 7 higher]) which reduced to 63 if the total available data (before any data splitting) was used [IQR:225 lower to 7 higher]. Studies that met the minimum required sample size had a median c-statistic of 0.84 (IQR:0.80 to 0.9) and studies where the minimum sample size was not met had a median c-statistic of 0.83 (IQR: 0.75 to 0.9). Studies that met the ≥ 10 EPP criteria had a median c-statistic of 0.80 (IQR: 0.73 to 0.84). CONCLUSIONS Prediction models are often developed with no sample size calculation, as a consequence many are too small to precisely estimate the overall risk. We encourage researchers to justify, perform and report sample size calculations when developing a prediction model.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Cathy Qi
- Population Data Science, Faculty of Medicine, Health and Life Science, Swansea University Medical School, Swansea University, Singleton Park, Swansea, SA2 8PP, UK
| | - Garrett Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, UK
| | - Jamie C Sergeant
- Centre for Biostatistics, University of Manchester, Manchester Academic Health Science Centre, Manchester, M13 9PL, UK
- Centre for Epidemiology Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester Academic Health Science Centre, Manchester, M13 9PT, UK
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, B15 2TT, Birmingham, UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
33
|
Deniffel D, McAlpine K, Harder FN, Jain R, Lawson KA, Healy GM, Hui S, Zhang X, Salinas-Miranda E, van der Kwast T, Finelli A, Haider MA. Predicting the recurrence risk of renal cell carcinoma after nephrectomy: potential role of CT-radiomics for adjuvant treatment decisions. Eur Radiol 2023; 33:5840-5850. [PMID: 37074425 DOI: 10.1007/s00330-023-09551-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 01/09/2023] [Accepted: 02/12/2023] [Indexed: 04/20/2023]
Abstract
OBJECTIVES Previous trial results suggest that only a small number of patients with non-metastatic renal cell carcinoma (RCC) benefit from adjuvant therapy. We assessed whether the addition of CT-based radiomics to established clinico-pathological biomarkers improves recurrence risk prediction for adjuvant treatment decisions. METHODS This retrospective study included 453 patients with non-metastatic RCC undergoing nephrectomy. Cox models were trained to predict disease-free survival (DFS) using post-operative biomarkers (age, stage, tumor size and grade) with and without radiomics selected on pre-operative CT. Models were assessed using C-statistic, calibration, and decision curve analyses (repeated tenfold cross-validation). RESULTS At multivariable analysis, one of four selected radiomic features (wavelet-HHL_glcm_ClusterShade) was prognostic for DFS with an adjusted hazard ratio (HR) of 0.44 (p = 0.02), along with American Joint Committee on Cancer (AJCC) stage group (III versus I, HR 2.90; p = 0.002), grade 4 (versus grade 1, HR 8.90; p = 0.001), age (per 10 years HR 1.29; p = 0.03), and tumor size (per cm HR 1.13; p = 0.003). The discriminatory ability of the combined clinical-radiomic model (C = 0.80) was superior to that of the clinical model (C = 0.78; p < 0.001). Decision curve analysis revealed a net benefit of the combined model when used for adjuvant treatment decisions. At an exemplary threshold probability of ≥ 25% for disease recurrence within 5 years, using the combined versus the clinical model was equivalent to treating 9 additional patients (per 1000 assessed) who would recur without treatment (i.e., true-positive predictions) with no increase in false-positive predictions. CONCLUSION Adding CT-based radiomic features to established prognostic biomarkers improved post-operative recurrence risk assessment in our internal validation study and may help guide decisions regarding adjuvant therapy. KEY POINTS In patients with non-metastatic renal cell carcinoma undergoing nephrectomy, CT-based radiomics combined with established clinical and pathological biomarkers improved recurrence risk assessment. Compared to a clinical base model, the combined risk model enabled superior clinical utility if used to guide decisions on adjuvant treatment.
Collapse
Affiliation(s)
- Dominik Deniffel
- Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON, M5G 1X5, Canada
- Joint Department of Medical Imaging, University Health Network, Sinai Health System and University of Toronto, Toronto, ON, Canada
| | - Kristen McAlpine
- Division of Urology, Department of Surgical Oncology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Felix N Harder
- Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON, M5G 1X5, Canada
- Joint Department of Medical Imaging, University Health Network, Sinai Health System and University of Toronto, Toronto, ON, Canada
| | - Rahi Jain
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Keith A Lawson
- Division of Urology, Department of Surgical Oncology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Gerard M Healy
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON, M5G 1X5, Canada
- Joint Department of Medical Imaging, University Health Network, Sinai Health System and University of Toronto, Toronto, ON, Canada
- Department of Radiology, St Vincent's University Hospital, Dublin, Ireland
| | - Shirley Hui
- Division of Urology, Department of Surgical Oncology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Xiaoyu Zhang
- Division of Urology, Department of Surgical Oncology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Emmanuel Salinas-Miranda
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON, M5G 1X5, Canada
- Joint Department of Medical Imaging, University Health Network, Sinai Health System and University of Toronto, Toronto, ON, Canada
| | - Theodorus van der Kwast
- Department of Pathology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada
| | - Antonio Finelli
- Division of Urology, Department of Surgical Oncology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Masoom A Haider
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 600 University Avenue, Toronto, ON, M5G 1X5, Canada.
- Joint Department of Medical Imaging, University Health Network, Sinai Health System and University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
34
|
Hassan N, Slight R, Morgan G, Bates DW, Gallier S, Sapey E, Slight S. Road map for clinicians to develop and evaluate AI predictive models to inform clinical decision-making. BMJ Health Care Inform 2023; 30:e100784. [PMID: 37558245 PMCID: PMC10414079 DOI: 10.1136/bmjhci-2023-100784] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 07/24/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Predictive models have been used in clinical care for decades. They can determine the risk of a patient developing a particular condition or complication and inform the shared decision-making process. Developing artificial intelligence (AI) predictive models for use in clinical practice is challenging; even if they have good predictive performance, this does not guarantee that they will be used or enhance decision-making. We describe nine stages of developing and evaluating a predictive AI model, recognising the challenges that clinicians might face at each stage and providing practical tips to help manage them. FINDINGS The nine stages included clarifying the clinical question or outcome(s) of interest (output), identifying appropriate predictors (features selection), choosing relevant datasets, developing the AI predictive model, validating and testing the developed model, presenting and interpreting the model prediction(s), licensing and maintaining the AI predictive model and evaluating the impact of the AI predictive model. The introduction of an AI prediction model into clinical practice usually consists of multiple interacting components, including the accuracy of the model predictions, physician and patient understanding and use of these probabilities, expected effectiveness of subsequent actions or interventions and adherence to these. Much of the difference in whether benefits are realised relates to whether the predictions are given to clinicians in a timely way that enables them to take an appropriate action. CONCLUSION The downstream effects on processes and outcomes of AI prediction models vary widely, and it is essential to evaluate the use in clinical practice using an appropriate study design.
Collapse
Affiliation(s)
- Nehal Hassan
- School of Pharmacy, Newcastle University School of Pharmacy, Newcastle Upon Tyne, UK
- Faculty of Medical Sciences, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Robert Slight
- Faculty of Medical Sciences, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
- Freeman Hospital, Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle Upon Tyne, UK
| | - Graham Morgan
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - David W Bates
- Department of General Internal Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Suzy Gallier
- PIONEER Health Data Research Hub, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Department of Health Informatics, PIONEER Health Data Research Hub, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Elizabeth Sapey
- PIONEER Health Data Research Hub, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Department of Health Informatics, PIONEER Health Data Research Hub, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Sarah Slight
- School of Pharmacy, Newcastle University School of Pharmacy, Newcastle Upon Tyne, UK
- Faculty of Medical Sciences, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
35
|
Logullo P, MacCarthy A, Dhiman P, Kirtley S, Ma J, Bullock G, Collins GS. Artificial intelligence in lung cancer diagnostic imaging: a review of the reporting and conduct of research published 2018-2019. BJR Open 2023; 5:20220033. [PMID: 37389003 PMCID: PMC10301715 DOI: 10.1259/bjro.20220033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 04/04/2023] [Accepted: 04/04/2023] [Indexed: 07/01/2023] Open
Abstract
Objective This study aimed to describe the methodologies used to develop and evaluate models that use artificial intelligence (AI) to analyse lung images in order to detect, segment (outline borders of), or classify pulmonary nodules as benign or malignant. Methods In October 2019, we systematically searched the literature for original studies published between 2018 and 2019 that described prediction models using AI to evaluate human pulmonary nodules on diagnostic chest images. Two evaluators independently extracted information from studies, such as study aims, sample size, AI type, patient characteristics, and performance. We summarised data descriptively. Results The review included 153 studies: 136 (89%) development-only studies, 12 (8%) development and validation, and 5 (3%) validation-only. CT scans were the most common type of image type used (83%), often acquired from public databases (58%). Eight studies (5%) compared model outputs with biopsy results. 41 studies (26.8%) reported patient characteristics. The models were based on different units of analysis, such as patients, images, nodules, or image slices or patches. Conclusion The methods used to develop and evaluate prediction models using AI to detect, segment, or classify pulmonary nodules in medical imaging vary, are poorly reported, and therefore difficult to evaluate. Transparent and complete reporting of methods, results and code would fill the gaps in information we observed in the study publications. Advances in knowledge We reviewed the methodology of AI models detecting nodules on lung images and found that the models were poorly reported and had no description of patient characteristics, with just a few comparing models' outputs with biopsies results. When lung biopsy is not available, lung-RADS could help standardise the comparisons between the human radiologist and the machine. The field of radiology should not give up principles from the diagnostic accuracy studies, such as the choice for the correct ground truth, just because AI is used. Clear and complete reporting of the reference standard used would help radiologists trust in the performance that AI models claim to have. This review presents clear recommendations about the essential methodological aspects of diagnostic models that should be incorporated in studies using AI to help detect or segmentate lung nodules. The manuscript also reinforces the need for more complete and transparent reporting, which can be helped using the recommended reporting guidelines.
Collapse
Affiliation(s)
| | | | | | | | | | - Garrett Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States
| | | |
Collapse
|
36
|
Garbin C, Marques N, Marques O. Machine learning for predicting opioid use disorder from healthcare data: A systematic review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 236:107573. [PMID: 37148670 DOI: 10.1016/j.cmpb.2023.107573] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/26/2023] [Indexed: 05/08/2023]
Abstract
INTRODUCTION The US opioid epidemic has been one of the leading causes of injury-related deaths according to the CDC Injury Center. The increasing availability of data and tools for machine learning (ML) resulted in more researchers creating datasets and models to help analyze and mitigate the crisis. This review investigates peer-reviewed journal papers that applied ML models to predict opioid use disorder (OUD). The review is split into two parts. The first part summarizes the current research in OUD prediction with ML. The second part evaluates how ML techniques and processes were used to achieve these results and suggests improvements to refine further attempts to use ML for OUD prediction. METHODS The review includes peer-reviewed journal papers published on or after 2012 that use healthcare data to predict OUD. We searched Google Scholar, Semantic Scholar, PubMed, IEEE Xplore, and Science.gov in September of 2022. Data extracted includes the study's goal, dataset used, cohort selected, types of ML models created, model evaluation metrics, and the details of the ML tools and techniques used to create the models. RESULTS The review analyzed 16 papers. Three papers created their dataset, five used a publicly available dataset, and the remaining eight used a private dataset. Cohort size ranged from the low hundreds to over half a million. Six papers used one type of ML model, and the remaining ten used up to five different ML models. The reported ROC AUC was higher than 0.8 for all but one of the papers. Five papers used only non-interpretable models, and the other 11 used interpretable models exclusively or in combination with non-interpretable ones. The interpretable models were the highest or second-highest ROC AUC values. Most papers did not sufficiently describe the ML techniques and tools used to produce their results. Only three papers published their source code. CONCLUSIONS We found that while there are indications that ML methods applied to OUD prediction may be valuable, the lack of details and transparency in creating the ML models limits their usefulness. We end the review with recommendations to improve studies on this critical healthcare subject.
Collapse
Affiliation(s)
- Christian Garbin
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA.
| | - Nicholas Marques
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA
| | - Oge Marques
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA
| |
Collapse
|
37
|
Yasrebi-de Kom IAR, Dongelmans DA, de Keizer NF, Jager KJ, Schut MC, Abu-Hanna A, Klopotowska JE. Electronic health record-based prediction models for in-hospital adverse drug event diagnosis or prognosis: a systematic review. J Am Med Inform Assoc 2023; 30:978-988. [PMID: 36805926 PMCID: PMC10114128 DOI: 10.1093/jamia/ocad014] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/22/2023] Open
Abstract
OBJECTIVE We conducted a systematic review to characterize and critically appraise developed prediction models based on structured electronic health record (EHR) data for adverse drug event (ADE) diagnosis and prognosis in adult hospitalized patients. MATERIALS AND METHODS We searched the Embase and Medline databases (from January 1, 1999, to July 4, 2022) for articles utilizing structured EHR data to develop ADE prediction models for adult inpatients. For our systematic evidence synthesis and critical appraisal, we applied the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS). RESULTS Twenty-five articles were included. Studies often did not report crucial information such as patient characteristics or the method for handling missing data. In addition, studies frequently applied inappropriate methods, such as univariable screening for predictor selection. Furthermore, the majority of the studies utilized ADE labels that only described an adverse symptom while not assessing causality or utilizing a causal model. None of the models were externally validated. CONCLUSIONS Several challenges should be addressed before the models can be widely implemented, including the adherence to reporting standards and the adoption of best practice methods for model development and validation. In addition, we propose a reorientation of the ADE prediction modeling domain to include causality as a fundamental challenge that needs to be addressed in future studies, either through acquiring ADE labels via formal causality assessments or the usage of adverse event labels in combination with causal prediction modeling.
Collapse
Affiliation(s)
- Izak A R Yasrebi-de Kom
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Dave A Dongelmans
- Amsterdam Public Health, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Department of Intensive Care Medicine, Amsterdam, The Netherlands
| | - Nicolette F de Keizer
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Kitty J Jager
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
- Amsterdam Cardiovascular Sciences, Pulmonary Hypertension & Thrombosis, Amsterdam, The Netherlands
| | - Martijn C Schut
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
- Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Clinical Chemistry, Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Joanna E Klopotowska
- Amsterdam UMC location University of Amsterdam, Department of Medical Informatics, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| |
Collapse
|
38
|
Munguía-Realpozo P, Etchegaray-Morales I, Mendoza-Pinto C, Méndez-Martínez S, Osorio-Peña ÁD, Ayón-Aguilar J, García-Carrasco M. Current state and completeness of reporting clinical prediction models using machine learning in systemic lupus erythematosus: A systematic review. Autoimmun Rev 2023; 22:103294. [PMID: 36791873 DOI: 10.1016/j.autrev.2023.103294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 02/09/2023] [Indexed: 02/17/2023]
Abstract
OBJECTIVE We carried out a systematic review (SR) of adherence in diagnostic and prognostic applications of ML in SLE using the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. METHODS A SR employing five databases was conducted from its inception until December 2021. We identified articles that evaluated the utilization of ML for prognostic and/or diagnostic purposes. This SR was reported based on the PRISMA guidelines. The TRIPOD statement assessed adherence to reporting standards. Assessment for risk of bias was done using PROBAST tool. RESULTS We included 45 studies: 29 (64.4%) diagnostic and 16 (35.5%) prognostic prediction- model studies. Overall, articles adhered by between 17% and 67% (median 43%, IQR 37-49%) to TRIPOD items. Only few articles reported the model's predictive performance (2.3%, 95% CI 0.06-12.0), testing of interaction terms (2.3%, 95% CI 0.06-12.0), flow of participants (50%, 95% CI; 34.6-65.4), blinding of predictors (2.3%, 95% CI 0.06-12.0), handling of missing data (36.4%, 95% CI 22.4-52.2), and appropriate title (20.5%, 95% CI 9.8-35.3). Some items were almost completely reported: the source of data (88.6%, 95% CI 75.4-96.2), eligibility criteria (86.4%, 95% CI 76.2-96.5), and interpretation of findings (88.6%, 95% CI 75.4-96.2). In addition, most of model studies had high risk of bias. CONCLUSIONS The reporting adherence of ML-based model developed for SLE, is currently inadequate. Several items deemed crucial for transparent reporting were not fully reported in studies on ML-based prediction models. REVIEW REGISTRATION PROSPERO ID# CRD42021284881. (Amended to limit the scope).
Collapse
Affiliation(s)
- Pamela Munguía-Realpozo
- Systemic Autoimmune Diseases Research Unit, Specialties Hospital UMAE- CIBIOR, Mexican Institute for Social Security, Puebla, Mexico; Department of Rheumatology, Medicine School, Meritorious Autonomous University of Puebla, Mexico
| | - Ivet Etchegaray-Morales
- Department of Rheumatology, Medicine School, Meritorious Autonomous University of Puebla, Mexico.
| | - Claudia Mendoza-Pinto
- Systemic Autoimmune Diseases Research Unit, Specialties Hospital UMAE- CIBIOR, Mexican Institute for Social Security, Puebla, Mexico; Department of Rheumatology, Medicine School, Meritorious Autonomous University of Puebla, Mexico.
| | | | - Ángel David Osorio-Peña
- Department of Rheumatology, Medicine School, Meritorious Autonomous University of Puebla, Mexico
| | - Jorge Ayón-Aguilar
- Coordination of Health Research, Mexican Social Security Institute, Puebla, Mexico.
| | - Mario García-Carrasco
- Department of Rheumatology, Medicine School, Meritorious Autonomous University of Puebla, Mexico
| |
Collapse
|
39
|
Marwaha JS, Chen HW, Habashy K, Choi J, Spain DA, Brat GA. Appraising the Quality of Development and Reporting in Surgical Prediction Models. JAMA Surg 2023; 158:214-216. [PMID: 36449299 PMCID: PMC9713676 DOI: 10.1001/jamasurg.2022.4488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 07/23/2022] [Indexed: 12/03/2022]
Abstract
This cross-sectional study uses the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis reporting guideline to assess 120 published studies about surgical prediction models.
Collapse
Affiliation(s)
- Jayson S Marwaha
- Beth Israel Deaconess Medical Center, Department of Surgery, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Hao Wei Chen
- Beth Israel Deaconess Medical Center, Department of Surgery, Boston, Massachusetts
| | - Karl Habashy
- American University of Beirut Medical Center, Beirut, Lebanon
| | - Jeff Choi
- Department of Surgery, Stanford University, Palo Alto, California
- Department of Biomedical Data Science, Stanford University, Palo Alto, California
| | - David A Spain
- Department of Surgery, Stanford University, Palo Alto, California
| | - Gabriel A Brat
- Beth Israel Deaconess Medical Center, Department of Surgery, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
40
|
Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, Ma J, Collins GS, Bajpai R, Riley RD, Moons KGM, Hooft L. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol 2023; 154:8-22. [PMID: 36436815 DOI: 10.1016/j.jclinepi.2022.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/09/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
BACKGROUND AND OBJECTIVES We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques. METHODS We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes. RESULTS We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]). CONCLUSION Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models. SYSTEMATIC REVIEW REGISTRATION PROSPERO, CRD42019161764.
Collapse
Affiliation(s)
- Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Toshihiko Takada
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Steven W J Nijman
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Paula Dhiman
- Center for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Jie Ma
- Center for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Gary S Collins
- Center for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Sciences, University of Oxford, Oxford, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Ram Bajpai
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Keele, UK
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands; Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
41
|
Tan WY, Hargreaves C, Chen C, Hilal S. A Machine Learning Approach for Early Diagnosis of Cognitive Impairment Using Population-Based Data. J Alzheimers Dis 2023; 91:449-461. [PMID: 36442196 PMCID: PMC9881033 DOI: 10.3233/jad-220776] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
BACKGROUND The major mechanisms of dementia and cognitive impairment are vascular and neurodegenerative processes. Early diagnosis of cognitive impairment can facilitate timely interventions to mitigate progression. OBJECTIVE This study aims to develop a reliable machine learning (ML) model using socio-demographics, vascular risk factors, and structural neuroimaging markers for early diagnosis of cognitive impairment in a multi-ethnic Asian population. METHODS The study consisted of 911 participants from the Epidemiology of Dementia in Singapore study (aged 60- 88 years, 49.6% male). Three ML classifiers, logistic regression, support vector machine, and gradient boosting machine, were developed. Prediction results of independent classifiers were combined in a final ensemble model. Model performances were evaluated on test data using F1 score and area under the receiver operating curve (AUC) methods. Post modelling, SHapely Additive exPlanation (SHAP) was applied on the prediction results to identify the predictors that contribute most to the cognitive impairment prediction. FINDINGS The final ensemble model achieved a F1 score and AUC of 0.87 and 0.80 respectively. Accuracy (0.83), sensitivity (0.86), specificity (0.74) and predictive values (positive 0.88 negative 0.72) of the ensemble model were higher compared to the independent classifiers. Age, ethnicity, highest education attainment and neuroimaging markers were identified as important predictors of cognitive impairment. CONCLUSION This study demonstrates the feasibility of using ML tools to integrate multiple domains of data for reliable diagnosis of early cognitive impairment. The ML model uses easy-to-obtain variables and is scalable for screening individuals with a high risk of developing dementia in a population-based setting.
Collapse
Affiliation(s)
- Wei Ying Tan
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore,
Institute of Data Science, National University of Singapore, Singapore
| | - Carol Hargreaves
- Data Analytics Consulting Centre, Faculty of Science, National University of Singapore, Singapore
| | - Christopher Chen
- Department of Pharmacology, National University of Singapore, Singapore,
Memory Aging and Cognition Center, National University Health System, Singapore
| | - Saima Hilal
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore,
Department of Pharmacology, National University of Singapore, Singapore,
Memory Aging and Cognition Center, National University Health System, Singapore,Correspondence to: Saima Hilal, PhD, Saw Swee Hock School of Public Health, National University of
Singapore, Tahir Foundation Building, 12 Science Drive 2, #10-03T, 117549, Singapore. E-mail: ; Department of Pharmacology, Yong Loo Lin School of Medicine, National
University of Singapore, Level 4, Block MD3, 16 Medical Drive, 117600, Singapore. Tel.: +65 65165885;
E-mail:
| |
Collapse
|
42
|
Nicol ED, Weir-McCall JR, Shaw LJ, Williamson E. Great debates in cardiac computed tomography: OPINION: "Artificial intelligence and the future of cardiovascular CT - Managing expectation and challenging hype". J Cardiovasc Comput Tomogr 2023; 17:11-17. [PMID: 35977872 DOI: 10.1016/j.jcct.2022.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/30/2022] [Accepted: 07/16/2022] [Indexed: 10/17/2022]
Abstract
This manuscript has been written as a follow-up to the "AI/ML great debate" featured at the 2021 Society of Cardiovascular Computed Tomography (SCCT) Annual Scientific Meeting. In debate style, we highlighti the need for expectation management of AI/ML, debunking the hype around current AI techniques, and countering the argument that in its current day format AI/ML is the "silver bullet" for the interpretation of daily clinical CCTA practice.
Collapse
Affiliation(s)
- Edward D Nicol
- Departments of Cardiology and Radiology, Royal Brompton Hospital, Guys and St Thomas' NHS Foundation Trust, London, UK; School of Biomedical Engineering and Imaging Sciences, King's College, London, UK.
| | - Jonathan R Weir-McCall
- School of Clinical Medicine, University of Cambridge, Cambridge, UK; Department of Radiology, Royal Papworth Hospital, Cambridge, UK
| | - Leslee J Shaw
- The Mount Sinai Hospital, 1468 Madison Ave, New York, NY 10029, United States
| | | |
Collapse
|
43
|
Zagidullin B, Pasanen A, Loukovaara M, Bützow R, Tang J. Interpretable prognostic modeling of endometrial cancer. Sci Rep 2022; 12:21543. [PMID: 36513790 PMCID: PMC9747711 DOI: 10.1038/s41598-022-26134-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
Endometrial carcinoma (EC) is one of the most common gynecological cancers in the world. In this work we apply Cox proportional hazards (CPH) and optimal survival tree (OST) algorithms to the retrospective prognostic modeling of disease-specific survival in 842 EC patients. We demonstrate that linear CPH models are preferred for the EC risk assessment based on clinical features alone, while interpretable, non-linear OST models are favored when patient profiles can be supplemented with additional biomarker data. We show how visually interpretable tree models can help generate and explore novel research hypotheses by studying the OST decision path structure, in which L1 cell adhesion molecule expression and estrogen receptor status are correctly indicated as important risk factors in the p53 abnormal EC subgroup. To aid further clinical adoption of advanced machine learning techniques, we stress the importance of quantifying model discrimination and calibration performance in the development of explainable clinical prediction models.
Collapse
Affiliation(s)
- Bulat Zagidullin
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland.
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, 00290, Helsinki, Finland.
| | - Annukka Pasanen
- Department of Pathology, University of Helsinki and Helsinki University Hospital, 00290, Helsinki, Finland
| | - Mikko Loukovaara
- Department of Obstetrics and Gynecology, Helsinki University Hospital and University of Helsinki, 00290, Helsinki, Finland
| | - Ralf Bützow
- Department of Pathology, University of Helsinki and Helsinki University Hospital, 00290, Helsinki, Finland
- Department of Obstetrics and Gynecology, Helsinki University Hospital and University of Helsinki, 00290, Helsinki, Finland
- Research Program in Applied Tumor Genomics, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland
| | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland.
- Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland.
| |
Collapse
|
44
|
Diniz MA. Statistical methods for validation of predictive models. J Nucl Cardiol 2022; 29:3248-3255. [PMID: 35610537 DOI: 10.1007/s12350-022-02994-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 04/08/2022] [Indexed: 01/18/2023]
Abstract
Predictive models are widely used in clinical practice. Despite of the increasing number of published AI systems, recent systematic reviews have identified lack of statistical rigor in the development and validation of predictive models. This work reviewed the current literature for predictive performance measures and resampling methods. Furthermore, common pitfalls were discussed.
Collapse
Affiliation(s)
- Marcio Augusto Diniz
- Biostatistics and Bioinformatics Research Center, Cedars-Sinai Medical Center, Los Angeles, USA.
| |
Collapse
|
45
|
Yang Q, Fan X, Cao X, Hao W, Lu J, Wei J, Tian J, Yin M, Ge L. Reporting and risk of bias of prediction models based on machine learning methods in preterm birth: A systematic review. Acta Obstet Gynecol Scand 2022; 102:7-14. [PMID: 36397723 PMCID: PMC9780725 DOI: 10.1111/aogs.14475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/27/2022] [Accepted: 10/04/2022] [Indexed: 11/19/2022]
Abstract
INTRODUCTION There was limited evidence on the quality of reporting and methodological quality of prediction models using machine learning methods in preterm birth. This systematic review aimed to assess the reporting quality and risk of bias of a machine learning-based prediction model in preterm birth. MATERIAL AND METHODS We conducted a systematic review, searching the PubMed, Embase, the Cochrane Library, China National Knowledge Infrastructure, China Biology Medicine disk, VIP Database, and WanFang Data from inception to September 27, 2021. Studies that developed (validated) a prediction model using machine learning methods in preterm birth were included. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement and Prediction model Risk of Bias Assessment Tool (PROBAST) to evaluate the reporting quality and the risk of bias of included studies, respectively. Findings were summarized using descriptive statistics and visual plots. The protocol was registered in PROSPERO (no. CRD 42022301623). RESULTS Twenty-nine studies met the inclusion criteria, with 24 development-only studies and 5 development-with-validation studies. Overall, TRIPOD adherence per study ranged from 17% to 79%, with a median adherence of 49%. The reporting of title, abstract, blinding of predictors, sample size justification, explanation of model, and model performance were mostly poor, with TRIPOD adherence ranging from 4% to 17%. For all included studies, 79% had a high overall risk of bias, and 21% had an unclear overall risk of bias. The analysis domain was most commonly rated as high risk of bias in included studies, mainly as a result of small effective sample size, selection of predictors based on univariable analysis, and lack of calibration evaluation. CONCLUSIONS Reporting and methodological quality of machine learning-based prediction models in preterm birth were poor. It is urgent to improve the design, conduct, and reporting of such studies to boost the application of machine learning-based prediction models in preterm birth in clinical practice.
Collapse
Affiliation(s)
- Qiuyu Yang
- Evidence‐Based Nursing Center, School of NursingLanzhou UniversityLanzhouChina
| | - Xia Fan
- Department of Obstetrics and Gynecology, The Second School of Clinical MedicineShanxi University of Chinese MedicineShanxiChina
| | - Xiao Cao
- Evidence‐Based Nursing Center, School of NursingLanzhou UniversityLanzhouChina
| | - Weijie Hao
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| | - Jiale Lu
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| | - Jia Wei
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| | - Jinhui Tian
- Key Laboratory of Evidence Based Medicine and Knowledge Translation of Gansu ProvinceLanzhouChina,Evidence‐Based Medicine Center, School of Basic Medicine ScienceLanzhou UniversityLanzhouChina
| | - Min Yin
- Health Examination CenterThe First Hospital of Lanzhou UniversityLanzhouChina
| | - Long Ge
- Evidence‐Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina,Department of Social Medicine and Health Management, and Evidence Based Social Science Research Center, School of Public HealthLanzhou UniversityLanzhouChina
| |
Collapse
|
46
|
Quality assessment of machine learning models for diagnostic imaging in orthopaedics: A systematic review. Artif Intell Med 2022; 132:102396. [DOI: 10.1016/j.artmed.2022.102396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 08/30/2022] [Accepted: 08/30/2022] [Indexed: 01/17/2023]
|
47
|
Parr H, Hall E, Porta N. Joint models for dynamic prediction in localised prostate cancer: a literature review. BMC Med Res Methodol 2022; 22:245. [PMID: 36123621 PMCID: PMC9487103 DOI: 10.1186/s12874-022-01709-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prostate cancer is a very prevalent disease in men. Patients are monitored regularly during and after treatment with repeated assessment of prostate-specific antigen (PSA) levels. Prognosis of localised prostate cancer is generally good after treatment, and the risk of having a recurrence is usually estimated based on factors measured at diagnosis. Incorporating PSA measurements over time in a dynamic prediction joint model enables updates of patients' risk as new information becomes available. We review joint model strategies that have been applied to model time-dependent PSA trajectories to predict time-to-event outcomes in localised prostate cancer. METHODS We identify articles that developed joint models for prediction of localised prostate cancer recurrence over the last two decades. We report, compare, and summarise the methodological approaches and applications that use joint modelling accounting for two processes: the longitudinal model (PSA), and the time-to-event process (clinical failure). The methods explored differ in how they specify the association between these two processes. RESULTS Twelve relevant articles were identified. A range of methodological frameworks were found, and we describe in detail shared-parameter joint models (9 of 12, 75%) and joint latent class models (3 of 12, 25%). Within each framework, these articles presented model development, estimation of dynamic predictions and model validations. CONCLUSIONS Each framework has its unique principles with corresponding advantages and differing interpretations. Regardless of the framework used, dynamic prediction models enable real-time prediction of individual patient prognosis. They utilise all available longitudinal information, in addition to baseline prognostic risk factors, and are superior to traditional baseline-only prediction models.
Collapse
Affiliation(s)
- Harry Parr
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| | - Emma Hall
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| | - Nuria Porta
- Clinical Trials and Statistics Unit at The Institute of Cancer Research, London, UK
| |
Collapse
|
48
|
Kothari R, Chiu C, Moukheiber M, Jehiro M, Bishara A, Lee C, Piracchio R, Celi LA. A descriptive appraisal of quality of reporting in a cohort of machine learning studies in anesthesiology. Anaesth Crit Care Pain Med 2022; 41:101126. [PMID: 35811037 DOI: 10.1016/j.accpm.2022.101126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 12/13/2022]
Abstract
BACKGROUND The field of machine learning is being employed more and more in medicine. However, studies have shown that the quality of published studies frequently lacks completeness and adherence to published reporting guidelines. This assessment has not been done in the subspecialty of anesthesiology. METHODS We appraised the quality of reporting of a convenience sample of 67 peer-reviewed publications sourced from the scoping review by Hashimoto et al. Each publication was appraised on the presence of reporting elements (reporting compliance) selected from 4 peer-reviewed guidelines for reporting on machine learning studies. Results are described in several cross sections, including by section of manuscript (e.g. abstract, introduction, etc.), year of publication, impact factor of journal, and impact of publication. RESULTS On average, reporting compliance was 64% ± 13%. There was marked heterogeneity of reporting based on section of manuscript. There was a mild trend towards increased quality of reporting with increasing impact factor of journal of publication and increasing average number of citations per year since publication, and no trend regarding recency of publication. CONCLUSION The quality of reporting of machine learning studies in anesthesiology is on par with other fields, but can benefit from improvement, especially in presenting methodology, results, and discussion points, including interpretation of models and pitfalls therein. Clinicians in today's learning health systems will benefit from skills in appraisal of evidence. Several reporting guidelines have been released, and updates to mainstream guidelines are under development, which we hope will usher in improvement in reporting quality.
Collapse
Affiliation(s)
- Rishi Kothari
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA 4143, USA.
| | - Catherine Chiu
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA 4143, USA
| | - Mira Moukheiber
- Picower Institute for Learning & Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Matthew Jehiro
- Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14260, USA
| | - Andrew Bishara
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA 4143, USA
| | - Christine Lee
- Edwards Lifesciences, Critical Care, Irvine, CA 92614, USA
| | - Romain Piracchio
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA 4143, USA
| | - Leo Anthony Celi
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
49
|
Chopannejad S, Sadoughi F, Bagherzadeh R, Shekarchi S. Predicting major adverse cardiovascular events in acute coronary syndrome: A scoping review of machine learning approaches. Appl Clin Inform 2022; 13:720-740. [PMID: 35617971 PMCID: PMC9329142 DOI: 10.1055/a-1863-1589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
Abstract
BACKGROUND Acute coronary syndrome is the topmost cause of death worldwide; therefore, it is necessary to predict major adverse cardiovascular events and cardiovascular deaths in patients with acute coronary syndrome to make correct and timely clinical decisions. OBJECTIVE The current review aimed to highlight algorithms and important predictor variables through examining those studies which used machine learning algorithms for predicting major adverse cardiovascular events in patients with acute coronary syndrome. METHODS In order to predict major adverse cardiovascular events in patients with acute coronary syndrome, the preferred reporting items for scoping reviews guidelines were used. PubMed, Embase, Web of Science, Scopus, Springer, and IEEE Xplore databases were searched for articles published between 2005 and 2021. The findings of the studies are presented in the form of a narrative synthesis of evidence. RESULTS According to the results, 14 (63.64%) studies did not perform external validation and only used registry data. The algorithms used in this study comprised, inter alia, Regression Logistic, Random Forest, Boosting Ensemble, Non-Boosting Ensemble, Decision Trees, and Naive Bayes. Multiple studies (N=20) achieved a high Area under the ROC Curve between 0.8 to 0.99 in predicting mortality and major adverse cardiovascular events. The predictor variables used in these studies were divided into demographic, clinical, and therapeutic features. However, no study reported the integration of machine learning model into clinical practice. CONCLUSION Machine learning algorithms rendered acceptable results to predict major adverse cardiovascular events and mortality outcomes in patients with acute coronary syndrome. However, these approaches have never been integrated into clinical practice. Further research is required to develop feasible and effective machine learning prediction models to measure their potentially important implications for optimizing the quality of care in patients with acute coronary syndrome.
Collapse
Affiliation(s)
- Sara Chopannejad
- Student Research Committee, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Farahnaz Sadoughi
- School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Rafat Bagherzadeh
- English Language Department, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Sakineh Shekarchi
- School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| |
Collapse
|
50
|
Abdulaziz KE, Perry JJ, Yadav K, Dowlatshahi D, Stiell IG, Wells GA, Taljaard M. Quality and transparency of reporting derivation and validation prognostic studies of recurrent stroke in patients with TIA and minor stroke: a systematic review. Diagn Progn Res 2022; 6:9. [PMID: 35585563 PMCID: PMC9118704 DOI: 10.1186/s41512-022-00123-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 03/01/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Clinical prediction models/scores help clinicians make optimal evidence-based decisions when caring for their patients. To critically appraise such prediction models for use in a clinical setting, essential information on the derivation and validation of the models needs to be transparently reported. In this systematic review, we assessed the quality of reporting of derivation and validation studies of prediction models for the prognosis of recurrent stroke in patients with transient ischemic attack or minor stroke. METHODS MEDLINE and EMBASE databases were searched up to February 04, 2020. Studies reporting development or validation of multivariable prognostic models predicting recurrent stroke within 90 days in patients with TIA or minor stroke were included. Included studies were appraised for reporting quality and conduct using a select list of items from the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) Statement. RESULTS After screening 7026 articles, 60 eligible articles were retained, consisting of 100 derivation and validation studies of 27 unique prediction models. Four models were newly derived while 23 were developed by validating and updating existing models. Of the 60 articles, 15 (25%) reported an informative title. Among the 100 derivation and validation studies, few reported whether assessment of the outcome (24%) and predictors (12%) was blinded. Similarly, sample size justifications (49%), description of methods for handling missing data (16.1%), and model calibration (5%) were seldom reported. Among the 96 validation studies, 17 (17.7%) clearly reported on similarity (in terms of setting, eligibility criteria, predictors, and outcomes) between the validation and the derivation datasets. Items with the highest prevalence of adherence were the source of data (99%), eligibility criteria (93%), measures of discrimination (81%) and study setting (65%). CONCLUSIONS The majority of derivation and validation studies for the prognosis of recurrent stroke in TIA and minor stroke patients suffer from poor reporting quality. We recommend that all prediction model derivation and validation studies follow the TRIPOD statement to improve transparency and promote uptake of more reliable prediction models in practice. TRIAL REGISTRATION The protocol for this review was registered with PROSPERO (Registration number CRD42020201130 ).
Collapse
Affiliation(s)
- Kasim E. Abdulaziz
- grid.412687.e0000 0000 9606 5108Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Jeffrey J. Perry
- grid.412687.e0000 0000 9606 5108Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Krishan Yadav
- grid.412687.e0000 0000 9606 5108Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Dar Dowlatshahi
- grid.28046.380000 0001 2182 2255School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario Canada
- grid.412687.e0000 0000 9606 5108Department of Medicine (Neurology), University of Ottawa, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
| | - Ian G. Stiell
- grid.412687.e0000 0000 9606 5108Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - George A. Wells
- grid.412687.e0000 0000 9606 5108Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255Cardiovascular Research Methods Centre, University of Ottawa Heart Institute, Ottawa, Ontario Canada
| | - Monica Taljaard
- grid.412687.e0000 0000 9606 5108Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario Canada
- grid.28046.380000 0001 2182 2255School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| |
Collapse
|