1
|
Crowley RJ, Tan YJ, Ioannidis JPA. Empirical assessment of bias in machine learning diagnostic test accuracy studies. J Am Med Inform Assoc 2020; 27:1092-1101. [PMID: 32548642 PMCID: PMC7647361 DOI: 10.1093/jamia/ocaa075] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 04/12/2020] [Accepted: 04/24/2020] [Indexed: 12/29/2022] Open
Abstract
OBJECTIVE Machine learning (ML) diagnostic tools have significant potential to improve health care. However, methodological pitfalls may affect diagnostic test accuracy studies used to appraise such tools. We aimed to evaluate the prevalence and reporting of design characteristics within the literature. Further, we sought to empirically assess whether design features may be associated with different estimates of diagnostic accuracy. MATERIALS AND METHODS We systematically retrieved 2 × 2 tables (n = 281) describing the performance of ML diagnostic tools, derived from 114 publications in 38 meta-analyses, from PubMed. Data extracted included test performance, sample sizes, and design features. A mixed-effects metaregression was run to quantify the association between design features and diagnostic accuracy. RESULTS Participant ethnicity and blinding in test interpretation was unreported in 90% and 60% of studies, respectively. Reporting was occasionally lacking for rudimentary characteristics such as study design (28% unreported). Internal validation without appropriate safeguards was used in 44% of studies. Several design features were associated with larger estimates of accuracy, including having unreported (relative diagnostic odds ratio [RDOR], 2.11; 95% confidence interval [CI], 1.43-3.1) or case-control study designs (RDOR, 1.27; 95% CI, 0.97-1.66), and recruiting participants for the index test (RDOR, 1.67; 95% CI, 1.08-2.59). DISCUSSION Significant underreporting of experimental details was present. Study design features may affect estimates of diagnostic performance in the ML diagnostic test accuracy literature. CONCLUSIONS The present study identifies pitfalls that threaten the validity, generalizability, and clinical value of ML diagnostic tools and provides recommendations for improvement.
Collapse
Affiliation(s)
- Ryan J Crowley
- Meta-Research Innovation Center at Stanford, Stanford University, Stanford, California, USA
- Department of Bioengineering, Stanford School of Engineering, Stanford University, Stanford, California, USA
| | - Yuan Jin Tan
- Meta-Research Innovation Center at Stanford, Stanford University, Stanford, California, USA
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, California, USA
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford, Stanford University, Stanford, California, USA
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, California, USA
- Stanford Prevention Research Center, Department of Medicine, Stanford Medicine, Stanford University, Stanford, California, USA
- Department of Biomedical Data Science, Stanford Medicine, Stanford University, Stanford, California, USA
- Department of Statistics, School of Humanities and Science, Stanford University, Stanford, California, USA
| |
Collapse
|
2
|
El-Samadony H, Azzazy HME, Tageldin MA, Ashour ME, Deraz IM, Elmaghraby T. Nanogold Assay Improves Accuracy of Conventional TB Diagnostics. Lung 2019; 197:241-247. [PMID: 30610370 DOI: 10.1007/s00408-018-00194-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 12/30/2018] [Indexed: 11/24/2022]
Abstract
PURPOSE TB nanodiagnostics have witnessed considerable development. However, most of the published reports did not proceed beyond proof-of-concept. Our objectives are to evaluate the diagnostic accuracy of a novel nanogold assay in detecting patients with active pulmonary TB based on results of BACTEC MGIT (reference test), and to compare its clinical performance to combined use of sputum smear microscopy (SSM) with chest X-ray (CXR). METHODS This is a case-control study that involved 20 active TB patients; 20 non-TB chest patients with a previous history of TB infection; 20 non-TB chest patients without a previous history of TB infection. RESULTS Sensitivity and specificity of TB nanogold assay were 95% and 100%, respectively, with diagnostic odds ratio (DOR) of 1053.0. ROC curve analysis yielded an area under curve (AUC) of 0.975. TB nanogold assay generated higher performance than combined use of SSM with CXR. The DOR and AUC differences were 996.0 and 0.125, respectively. CONCLUSIONS Our study shows that TB nanogold assay is accurate, rapid, and holds the potential for use as an add-on initial test to improve accuracy of SSM and CXR in detecting patients of active pulmonary TB in developing countries. Future studies should involve larger sample size for further assessment of test accuracy.
Collapse
Affiliation(s)
- Hesham El-Samadony
- Abbassia Chest Hospital, Ministry of Health, 6 El-Sekka El-Baydaa St, Nasr City, Cairo, 11759, Egypt. .,Department of Chest Diseases, Faculty of Medicine, Al-Azhar University, Cairo, Egypt.
| | - Hassan M E Azzazy
- Department of Chemistry, School of Sciences & Engineering, The American University in Cairo, P.O. Box 74, New Cairo, 11835, Egypt.
| | - Mohamed Awad Tageldin
- Department of Chest Diseases, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Mahmoud E Ashour
- Department of Chest Diseases, Faculty of Medicine, Al-Azhar University, Cairo, Egypt
| | - Ibrahim M Deraz
- Department of Chest Diseases, Faculty of Medicine, Al-Azhar University, Cairo, Egypt
| | - Tarek Elmaghraby
- Department of Molecular Biology, National Center for Radiation Research and Technology, Atomic Energy Authority, Cairo, Egypt
| |
Collapse
|
3
|
Lefebvre C, Glanville J, Beale S, Boachie C, Duffy S, Fraser C, Harbour J, McCool R, Smith L. Assessing the performance of methodological search filters to improve the efficiency of evidence information retrieval: five literature reviews and a qualitative study. Health Technol Assess 2018; 21:1-148. [PMID: 29188764 DOI: 10.3310/hta21690] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Effective study identification is essential for conducting health research, developing clinical guidance and health policy and supporting health-care decision-making. Methodological search filters (combinations of search terms to capture a specific study design) can assist in searching to achieve this. OBJECTIVES This project investigated the methods used to assess the performance of methodological search filters, the information that searchers require when choosing search filters and how that information could be better provided. METHODS Five literature reviews were undertaken in 2010/11: search filter development and testing; comparison of search filters; decision-making in choosing search filters; diagnostic test accuracy (DTA) study methods; and decision-making in choosing diagnostic tests. We conducted interviews and a questionnaire with experienced searchers to learn what information assists in the choice of search filters and how filters are used. These investigations informed the development of various approaches to gathering and reporting search filter performance data. We acknowledge that there has been a regrettable delay between carrying out the project, including the searches, and the publication of this report, because of serious illness of the principal investigator. RESULTS The development of filters most frequently involved using a reference standard derived from hand-searching journals. Most filters were validated internally only. Reporting of methods was generally poor. Sensitivity, precision and specificity were the most commonly reported performance measures and were presented in tables. Aspects of DTA study methods are applicable to search filters, particularly in the development of the reference standard. There is limited evidence on how clinicians choose between diagnostic tests. No published literature was found on how searchers select filters. Interviewing and questioning searchers via a questionnaire found that filters were not appropriate for all tasks but were predominantly used to reduce large numbers of retrieved records and to introduce focus. The Inter Technology Appraisal Support Collaboration (InterTASC) Information Specialists' Sub-Group (ISSG) Search Filters Resource was most frequently mentioned by both groups as the resource consulted to select a filter. Randomised controlled trial (RCT) and systematic review filters, in particular the Cochrane RCT and the McMaster Hedges filters, were most frequently mentioned. The majority indicated that they used different filters depending on the requirement for sensitivity or precision. Over half of the respondents used the filters available in databases. Interviewees used various approaches when using and adapting search filters. Respondents suggested that the main factors that would make choosing a filter easier were the availability of critical appraisals and more detailed performance information. Provenance and having the filter available in a central storage location were also important. LIMITATIONS The questionnaire could have been shorter and could have included more multiple choice questions, and the reviews of filter performance focused on only four study designs. CONCLUSIONS Search filter studies should use a representative reference standard and explicitly report methods and results. Performance measures should be presented systematically and clearly. Searchers find filters useful in certain circumstances but expressed a need for more user-friendly performance information to aid filter choice. We suggest approaches to use, adapt and report search filter performance. Future work could include research around search filters and performance measures for study designs not addressed here, exploration of alternative methods of displaying performance results and numerical synthesis of performance comparison results. FUNDING The National Institute for Health Research (NIHR) Health Technology Assessment programme and Medical Research Council-NIHR Methodology Research Programme (grant number G0901496).
Collapse
Affiliation(s)
- Carol Lefebvre
- UK Cochrane Centre, Oxford, UK.,Lefebvre Associates Ltd, Oxford, UK
| | | | | | - Charles Boachie
- Health Services Research Unit, University of Aberdeen, Aberdeen, UK
| | | | - Cynthia Fraser
- Health Services Research Unit, University of Aberdeen, Aberdeen, UK
| | | | | | - Lynne Smith
- Healthcare Improvement Scotland, Glasgow, UK
| |
Collapse
|
4
|
Clinical trials registries are underused in the pregnancy and childbirth literature: a systematic review of the top 20 journals. BMC Res Notes 2016; 9:475. [PMID: 27769265 PMCID: PMC5073738 DOI: 10.1186/s13104-016-2280-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 10/12/2016] [Indexed: 12/24/2022] Open
Abstract
Background Systematic reviews and meta-analyses that do not include unpublished data in their analyses may be prone to publication bias, which in some cases has been shown to have deleterious consequences on determining the efficacy of interventions. Methods We retrieved systematic reviews and meta-analyses published in the past 8 years (January 1, 2007–December 31, 2015) from the top 20 journals in the Pregnancy and Childbirth literature, as rated by Google Scholar’s h5-index. A meta-epidemiologic analysis was performed to determine the frequency with which authors searched clinical trials registries for unpublished data. Results A PubMed search retrieved 372 citations, 297 of which were deemed to be either a systematic review or a meta-analysis and were included for analysis. Twelve (4 %) of these searched at least one WHO-approved clinical trials registry or clinicaltrials.gov. Conclusion Systematic reviews and meta-analyses published in pregnancy and childbirth journals do not routinely report searches of clinical trials registries. Including these registries in systematic reviews may be a promising avenue to limit publication bias if registry searches locate unpublished trial data that could be used in the systematic review.
Collapse
|
5
|
Dinnes J, Bancos I, Ferrante di Ruffano L, Chortis V, Davenport C, Bayliss S, Sahdev A, Guest P, Fassnacht M, Deeks JJ, Arlt W. MANAGEMENT OF ENDOCRINE DISEASE: Imaging for the diagnosis of malignancy in incidentally discovered adrenal masses: a systematic review and meta-analysis. Eur J Endocrinol 2016; 175:R51-64. [PMID: 27257145 PMCID: PMC5065077 DOI: 10.1530/eje-16-0461] [Citation(s) in RCA: 117] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/29/2016] [Revised: 05/29/2016] [Accepted: 06/02/2016] [Indexed: 12/13/2022]
Abstract
OBJECTIVE Adrenal masses are incidentally discovered in 5% of CT scans. In 2013/2014, 81 million CT examinations were undertaken in the USA and 5 million in the UK. However, uncertainty remains around the optimal imaging approach for diagnosing malignancy. We aimed to review the evidence on the accuracy of imaging tests for differentiating malignant from benign adrenal masses. DESIGN A systematic review and meta-analysis was conducted. METHODS We searched MEDLINE, EMBASE, Cochrane CENTRAL Register of Controlled Trials, Science Citation Index, Conference Proceedings Citation Index, and ZETOC (January 1990 to August 2015). We included studies evaluating the accuracy of CT, MRI, or (18)F-fluoro-deoxyglucose (FDG)-PET compared with an adequate histological or imaging-based follow-up reference standard. RESULTS We identified 37 studies suitable for inclusion, after screening 5469 references and 525 full-text articles. Studies evaluated the accuracy of CT (n=16), MRI (n=15), and FDG-PET (n=9) and were generally small and at high or unclear risk of bias. Only 19 studies were eligible for meta-analysis. Limited data suggest that CT density >10HU has high sensitivity for detection of adrenal malignancy in participants with no prior indication for adrenal imaging, that is, masses with ≤10HU are unlikely to be malignant. All other estimates of test performance are based on too small numbers. CONCLUSIONS Despite their widespread use in routine assessment, there is insufficient evidence for the diagnostic value of individual imaging tests in distinguishing benign from malignant adrenal masses. Future research is urgently needed and should include prospective test validation studies for imaging and novel diagnostic approaches alongside detailed health economics analysis.
Collapse
Affiliation(s)
| | - Irina Bancos
- Institute of Metabolism and Systems ResearchUniversity of Birmingham, Birmingham, UK Division of EndocrinologyMetabolism, Nutrition and Diabetes, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Vasileios Chortis
- Institute of Metabolism and Systems ResearchUniversity of Birmingham, Birmingham, UK
| | | | | | - Anju Sahdev
- Department of ImagingSt Bartholomew's Hospital, Barts Health, London, UK
| | - Peter Guest
- Department of RadiologyQueen Elizabeth Hospital, University Hospital Birmingham NHS Foundation Trust, Birmingham, UK
| | - Martin Fassnacht
- Department of Internal Medicine IDivision of Endocrinology and Diabetes, University Hospital Würzburg, University of Würzburg, Würzburg, Germany Comprehensive Cancer Center MainfrankenUniversity of Würzburg, Würzburg, Germany
| | | | - Wiebke Arlt
- Institute of Metabolism and Systems ResearchUniversity of Birmingham, Birmingham, UK Centre for EndocrinologyDiabetes and Metabolism, Birmingham Health Partners, Birmingham, UK
| |
Collapse
|
6
|
Whiting PF, Rutjes AWS, Westwood ME, Mallett S. A systematic review classifies sources of bias and variation in diagnostic test accuracy studies. J Clin Epidemiol 2013; 66:1093-104. [PMID: 23958378 DOI: 10.1016/j.jclinepi.2013.05.014] [Citation(s) in RCA: 196] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 05/08/2013] [Accepted: 05/15/2013] [Indexed: 11/15/2022]
Abstract
OBJECTIVE To classify the sources of bias and variation and to provide an updated summary of the evidence of the effects of each source of bias and variation. STUDY DESIGN AND SETTING We conducted a systematic review of studies of any design with the main objective of addressing bias or variation in the results of diagnostic accuracy studies. We searched MEDLINE, EMBASE, BIOSIS, the Cochrane Methodology Register, and Database of Abstracts of Reviews of Effects (DARE) from 2001 to October 2011. Citation searches based on three key papers were conducted, and studies from our previous review (search to 2001) were eligible. One reviewer extracted data on the study design, objective, sources of bias and/or variation, and results. A second reviewer checked the extraction. RESULTS We summarized the number of studies providing evidence of an effect arising from each source of bias and variation on the estimates of sensitivity, specificity, and overall accuracy. CONCLUSIONS We found consistent evidence for the effects of case-control design, observer variability, availability of clinical information, reference standard, partial and differential verification bias, demographic features, and disease prevalence and severity. Effects were generally stronger for sensitivity than for specificity. Evidence for other sources of bias and variation was limited.
Collapse
Affiliation(s)
- Penny F Whiting
- Kleijnen Systematic Reviews Ltd, Unit 6, Escrick Business Park, Riccall Road, Escrick, York YO19 6FD, United Kingdom.
| | | | | | | | | |
Collapse
|
7
|
Methodological quality of diagnostic accuracy studies on non-invasive coronary CT angiography: influence of QUADAS (Quality Assessment of Diagnostic Accuracy Studies included in systematic reviews) items on sensitivity and specificity. Eur Radiol 2013; 23:1603-22. [PMID: 23322410 DOI: 10.1007/s00330-012-2763-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Revised: 11/29/2012] [Accepted: 12/10/2012] [Indexed: 12/27/2022]
Abstract
OBJECTIVES To evaluate the methodological quality of diagnostic accuracy studies on coronary computed tomography (CT) angiography using the QUADAS (Quality Assessment of Diagnostic Accuracy Studies included in systematic reviews) tool. METHODS Each QUADAS item was individually defined to adapt it to the special requirements of studies on coronary CT angiography. Two independent investigators analysed 118 studies using 12 QUADAS items. Meta-regression and pooled analyses were performed to identify possible effects of methodological quality items on estimates of diagnostic accuracy. RESULTS The overall methodological quality of coronary CT studies was merely moderate. They fulfilled a median of 7.5 out of 12 items. Only 9 of the 118 studies fulfilled more than 75 % of possible QUADAS items. One QUADAS item ("Uninterpretable Results") showed a significant influence (P = 0.02) on estimates of diagnostic accuracy with "no fulfilment" increasing specificity from 86 to 90 %. Furthermore, pooled analysis revealed that each QUADAS item that is not fulfilled has the potential to change estimates of diagnostic accuracy. CONCLUSIONS The methodological quality of studies investigating the diagnostic accuracy of non-invasive coronary CT is only moderate and was found to affect the sensitivity and specificity. An improvement is highly desirable because good methodology is crucial for adequately assessing imaging technologies. KEY POINTS • Good methodological quality is a basic requirement in diagnostic accuracy studies. • Most coronary CT angiography studies have only been of moderate design quality. • Weak methodological quality will affect the sensitivity and specificity. • No improvement in methodological quality was observed over time. • Authors should consider the QUADAS checklist when undertaking accuracy studies.
Collapse
|