1
|
Fuld S, Constantinescu G, Pamporaki C, Peitzsch M, Schulze M, Yang J, Müller L, Prejbisz A, Januszewicz A, Remde H, Kürzinger L, Dischinger U, Ernst M, Gruber S, Reincke M, Beuschlein F, Lenders JWM, Eisenhofer G. Screening for Primary Aldosteronism by Mass Spectrometry Versus Immunoassay Measurements of Aldosterone: A Prospective Within-Patient Study. J Appl Lab Med 2024; 9:752-766. [PMID: 38532521 DOI: 10.1093/jalm/jfae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 01/18/2024] [Indexed: 03/28/2024]
Abstract
BACKGROUND Measurements of aldosterone by mass spectrometry are more accurate and less prone to interferences than immunoassay measurements, and may produce a more accurate aldosterone:renin ratio (ARR) when screening for primary aldosteronism (PA). METHODS Differences in diagnostic performance of the ARR using mass spectrometry vs immunoassay measurements of aldosterone were examined in 710 patients screened for PA. PA was confirmed in 153 patients and excluded in 451 others. Disease classifications were not achieved in 106 patients. Areas under receiver-operating characteristic curves (AUROC) and other measures were used to compare diagnostic performance. RESULTS Mass spectrometry-based measurements yielded lower plasma aldosterone concentrations than immunoassay measurements. For the ARR based on immunoassay measurements of aldosterone, AUROCs were slightly lower (P = 0.018) than those using mass spectrometry measurements (0.895 vs 0.906). The cutoff for the ARR to reach a sensitivity of 95% was 30 and 21.5 pmol/mU by respective immunoassay and mass spectrometry-based measurements, which corresponded to specificities of 57% for both. With data restricted to patients with unilateral PA, diagnostic sensitivities of 94% with specificities >81% could be achieved at cutoffs of 68 and 52 pmol/mU for respective immunoassay and mass spectrometry measurements. CONCLUSIONS Mass spectrometry-based measurements of aldosterone for the ARR provide no clear diagnostic advantage over immunoassay-based measurements. Both approaches offer limited diagnostic accuracy for the ARR as a screening test. One solution is to employ the higher cutoffs to triage patients likely to have unilateral PA for further tests and possible adrenalectomy, while using the lower cutoffs to identify others for targeted medical therapy.German Clinical Trials Register ID: DRKS00017084.
Collapse
Affiliation(s)
- Sybille Fuld
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Georgiana Constantinescu
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Christina Pamporaki
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Mirko Peitzsch
- Institute of Clinical Chemistry and Laboratory Medicine, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Manuel Schulze
- Center for Interdisciplinary Digital Sciences, Department Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany
| | - Jun Yang
- Centre for Endocrinology and Metabolism, Hudson Institute of Medical Research, Clayton, Australia
| | - Lisa Müller
- Department of Medicine IV, University Hospital, Ludwig Maximilian University Munich, Munich, Germany
| | - Aleksander Prejbisz
- Department of Epidemiology, Cardiovascular Prevention and Health Promotion, National Institute of Cardiology, Warsaw, Poland
| | - Andrzej Januszewicz
- Department of Hypertension, National Institute of Cardiology, Warsaw, Poland
| | - Hanna Remde
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
| | - Lydia Kürzinger
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
| | - Ulrich Dischinger
- Department of Internal Medicine I, Division of Endocrinology and Diabetes, University Hospital, University of Würzburg, Würzburg, Germany
| | - Matthias Ernst
- Department of Endocrinology, Diabetology and Clinical Nutrition, University Hospital Zurich (USZ) and University of Zurich (UZH), Zurich, Switzerland
| | - Sven Gruber
- Department of Endocrinology, Diabetology and Clinical Nutrition, University Hospital Zurich (USZ) and University of Zurich (UZH), Zurich, Switzerland
| | - Martin Reincke
- Department of Medicine IV, University Hospital, Ludwig Maximilian University Munich, Munich, Germany
| | - Felix Beuschlein
- Department of Medicine IV, University Hospital, Ludwig Maximilian University Munich, Munich, Germany
- Department of Endocrinology, Diabetology and Clinical Nutrition, University Hospital Zurich (USZ) and University of Zurich (UZH), Zurich, Switzerland
- The LOOP Medical Research Center, Zurich, Switzerland
| | - Jacques W M Lenders
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Department of Internal Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Graeme Eisenhofer
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
2
|
Chubak J, Burnett-Hartman AN, Barlow WE, Corley DA, Croswell JM, Neslund-Dudas C, Vachani A, Silver MI, Tiro JA, Kamineni A. Estimating Cancer Screening Sensitivity and Specificity Using Healthcare Utilization Data: Defining the Accuracy Assessment Interval. Cancer Epidemiol Biomarkers Prev 2022; 31:1517-1520. [PMID: 35916602 PMCID: PMC9484579 DOI: 10.1158/1055-9965.epi-22-0232] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 04/29/2022] [Accepted: 05/23/2022] [Indexed: 11/16/2022] Open
Abstract
The effectiveness and efficiency of cancer screening in real-world settings depend on many factors, including test sensitivity and specificity. Outside of select experimental studies, not everyone receives a gold standard test that can serve as a comparator in estimating screening test accuracy. Thus, many studies of screening test accuracy use the passage of time to infer whether or not cancer was present at the time of the screening test, particularly for patients with a negative screening test. We define the accuracy assessment interval as the period of time after a screening test that is used to estimate the test's accuracy. We describe how the length of this interval may bias sensitivity and specificity estimates. We call for future research to quantify bias and uncertainty in accuracy estimates and to provide guidance on setting accuracy assessment interval lengths for different cancers and screening modalities.
Collapse
Affiliation(s)
- Jessica Chubak
- Kaiser Permanente Washington Health Research Institute, Seattle, WA
- Department of Epidemiology, University of Washington, Seattle, WA
| | - Andrea N. Burnett-Hartman
- Kaiser Permanente Colorado Institute for Health Research, Aurora, CO
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA
| | | | | | | | - Christine Neslund-Dudas
- Department of Public Health Sciences and Henry Ford Cancer Institute, Henry Ford Health System, Detroit, MI
| | - Anil Vachani
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Michelle I. Silver
- Division of Public Health Sciences, Washington University School of Medicine, St. Louis, MO
| | - Jasmin A. Tiro
- Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX
- Simmons Comprehensive Cancer Center, Dallas, TX
| | - Aruna Kamineni
- Kaiser Permanente Washington Health Research Institute, Seattle, WA
| |
Collapse
|
3
|
Day E, Eldred-Evans D, Prevost AT, Ahmed HU, Fiorentino F. Adjusting for verification bias in diagnostic accuracy measures when comparing multiple screening tests - an application to the IP1-PROSTAGRAM study. BMC Med Res Methodol 2022; 22:70. [PMID: 35300611 PMCID: PMC8932251 DOI: 10.1186/s12874-021-01481-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 11/18/2021] [Indexed: 11/29/2022] Open
Abstract
Introduction Novel screening tests used to detect a target condition are compared against either a reference standard or other existing screening methods. However, as it is not always possible to apply the reference standard on the whole population under study, verification bias is introduced. Statistical methods exist to adjust estimates to account for this bias. We extend common methods to adjust for verification bias when multiple tests are compared to a reference standard using data from a prospective double blind screening study for prostate cancer. Methods Begg and Greenes method and multiple imputation are extended to include the results of multiple screening tests which determine condition verification status. These two methods are compared to the complete case analysis using the IP1-PROSTAGRAM study data. IP1-PROSTAGRAM used a paired-cohort double-blind design to evaluate the use of imaging as alternative tests to screen for prostate cancer, compared to a blood test called prostate specific antigen (PSA). Participants with positive imaging (index) and/or PSA (control) underwent a prostate biopsy (reference standard). Results When comparing complete case results to Begg and Greenes and methods of multiple imputation there is a statistically significant increase in the specificity estimates for all screening tests. Sensitivity estimates remained similar across the methods, with completely overlapping 95% confidence intervals. Negative predictive value (NPV) estimates were higher when adjusting for verification bias, compared to complete case analysis, even though the 95% confidence intervals overlap. Positive predictive value (PPV) estimates were similar across all methods. Conclusion Statistical methods are required to adjust for verification bias in accuracy estimates of screening tests. Expanding Begg and Greenes method to include multiple screening tests can be computationally intensive, hence multiple imputation is recommended, especially as it can be modified for low prevalence of the target condition. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01481-w.
Collapse
Affiliation(s)
- Emily Day
- Imperial Clinical Trials Unit, Imperial College London, London, UK
| | - David Eldred-Evans
- Imperial Prostate, Division of Surgery, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
| | - A Toby Prevost
- Nightingale-Saunders Unit, King's Clinical Trials Unit, King's College London, London, UK
| | - Hashim U Ahmed
- Imperial Prostate, Division of Surgery, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK.,Imperial Urology, Imperial College Healthcare NHS Trust, London, UK
| | - Francesca Fiorentino
- Imperial Clinical Trials Unit, Imperial College London, London, UK. .,Nightingale-Saunders Unit, King's Clinical Trials Unit, King's College London, London, UK. .,Division of Surgery, Imperial College London, St Mary's Hospital, Praed Street, London, W2 1NY, UK.
| |
Collapse
|
4
|
Umemneku Chikere CM, Wilson K, Graziadio S, Vale L, Allen AJ. Diagnostic test evaluation methodology: A systematic review of methods employed to evaluate diagnostic tests in the absence of gold standard - An update. PLoS One 2019; 14:e0223832. [PMID: 31603953 PMCID: PMC6788703 DOI: 10.1371/journal.pone.0223832] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 09/29/2019] [Indexed: 12/29/2022] Open
Abstract
OBJECTIVE To systematically review methods developed and employed to evaluate the diagnostic accuracy of medical test when there is a missing or no gold standard. STUDY DESIGN AND SETTINGS Articles that proposed or applied any methods to evaluate the diagnostic accuracy of medical test(s) in the absence of gold standard were reviewed. The protocol for this review was registered in PROSPERO (CRD42018089349). RESULTS Identified methods were classified into four main groups: methods employed when there is a missing gold standard; correction methods (which make adjustment for an imperfect reference standard with known diagnostic accuracy measures); methods employed to evaluate a medical test using multiple imperfect reference standards; and other methods, like agreement studies, and a mixed group of alternative study designs. Fifty-one statistical methods were identified from the review that were developed to evaluate medical test(s) when the true disease status of some participants is unverified with the gold standard. Seven correction methods were identified and four methods were identified to evaluate medical test(s) using multiple imperfect reference standards. Flow-diagrams were developed to guide the selection of appropriate methods. CONCLUSION Various methods have been proposed to evaluate medical test(s) in the absence of a gold standard for some or all participants in a diagnostic accuracy study. These methods depend on the availability of the gold standard, its' application to the participants in the study and the availability of alternative reference standard(s). The clinical application of some of these methods, especially methods developed when there is missing gold standard is however limited. This may be due to the complexity of these methods and/or a disconnection between the fields of expertise of those who develop (e.g. mathematicians) and those who employ the methods (e.g. clinical researchers). This review aims to help close this gap with our classification and guidance tools.
Collapse
Affiliation(s)
- Chinyereugo M. Umemneku Chikere
- Institute of Health & Society, Faculty of Medical Sciences Newcastle University, Newcastle upon Tyne, England, United Kingdom
| | - Kevin Wilson
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne, England, United Kingdom
| | - Sara Graziadio
- National Institute for Health Research, Newcastle In Vitro Diagnostics Co-operative, Newcastle upon Tyne Hospitals National Health Services Foundation Trust, Newcastle upon Tyne, England, United Kingdom
| | - Luke Vale
- Institute of Health & Society, Faculty of Medical Sciences Newcastle University, Newcastle upon Tyne, England, United Kingdom
| | - A. Joy Allen
- National Institute for Health Research, Newcastle In Vitro Diagnostics Co-operative, Newcastle University, Newcastle upon Tyne, England, United Kingdom
| |
Collapse
|
5
|
Naaktgeboren CA, de Groot JAH, Rutjes AWS, Bossuyt PMM, Reitsma JB, Moons KGM. Anticipating missing reference standard data when planning diagnostic accuracy studies. BMJ 2016; 352:i402. [PMID: 26861453 PMCID: PMC4772780 DOI: 10.1136/bmj.i402] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Results obtained using a reference standard may be missing for some participants in diagnostic accuracy studies. This paper looks at methods for dealing with such missing data when designing or conducting a prospective diagnostic accuracy study
Collapse
Affiliation(s)
- Christiana A Naaktgeboren
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| | - Joris A H de Groot
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| | - Anne W S Rutjes
- CTU Bern, Department of Clinical Research, University of Bern, Switzerland Institute of Social and Preventive Medicine, University of Bern, Switzerland
| | - Patrick M M Bossuyt
- Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands
| | - Johannes B Reitsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, 3584 CG Utrecht, Netherlands
| |
Collapse
|
6
|
Sun X, Allison C, Matthews FE, Zhang Z, Auyeung B, Baron-Cohen S, Brayne C. Exploring the Underdiagnosis and Prevalence of Autism Spectrum Conditions in Beijing. Autism Res 2015; 8:250-60. [PMID: 25952676 PMCID: PMC4690159 DOI: 10.1002/aur.1441] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 11/17/2014] [Indexed: 01/07/2023]
Abstract
Previous studies reported that the prevalence of Autism Spectrum Conditions (ASC) in mainland China is much lower than estimates from developed countries (around 1%). The aim of the study is to apply current screening and standardized diagnostic instruments to a Chinese population to establish a prevalence estimate of ASC in an undiagnosed population in mainland China. We followed the design development used previously in the UK published in 2009 by Baron‐Cohen and colleagues. The Mandarin Childhood Autism Spectrum Test (CAST) was validated by screening primary school pupils (n = 737 children age 6–10 years old) in Beijing and by conducting diagnostic assessments using the Autism Diagnostic Observation Schedule and the Autism Diagnostic Interview‐Revised. The prevalence estimate was generated after adjusting and imputing for missing values using the inverse probability weighting. Response was high (97%). Using the UK cutoff (≥15), CAST performance has 84% sensitivity and 96% specificity (95% confidence interval [CI]: 46, 98, and 96, 97, respectively). Six out of 103 children, not previously diagnosed, were found to the meet diagnostic criteria (8.5 after adjustment, 95% CI: 1.6, 15.4). The preliminary prevalence in an undiagnosed primary school population in mainland China was 119 per 10,000 (95% CI: 53, 265). The utility of CAST is acceptable as a screening instrument for ASC in large epidemiological studies in China. Using a comparable method, the preliminary prevalence estimate of ASC in mainland China is similar to that of those from developed countries. Autism Res2015, 8: 250–260. © 2015 The Authors. Autism Research published by Wiley Periodicals, Inc. on behalf of International Society for Autism Research
Collapse
Affiliation(s)
- Xiang Sun
- Department of Public Health and Primary Care, Cambridge Institute of Public Health, University of Cambridge, Cambridge, UK.,Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK.,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK
| | - Carrie Allison
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK.,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK
| | - Fiona E Matthews
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
| | - Zhixiang Zhang
- Pediatrics Department, Peking University First Hospital, Beijing, China
| | - Bonnie Auyeung
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK.,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK.,Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - Simon Baron-Cohen
- Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, UK.,Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK
| | - Carol Brayne
- Department of Public Health and Primary Care, Cambridge Institute of Public Health, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Margel D, Benjaminov O, Ozalvo R, Shavit Grievink L, Kedar I, Yerushalmi R, Ben-Aharon I, Neiman V, Yossepowitch O, Kedar D, Levy Z, Shohat M, Brenner B, Baniel J, Rosenbaum E. Personalized prostate cancer screening among men with high risk genetic predisposition- study protocol for a prospective cohort study. BMC Cancer 2014; 14:528. [PMID: 25047061 PMCID: PMC4223504 DOI: 10.1186/1471-2407-14-528] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 07/10/2014] [Indexed: 12/24/2022] Open
Abstract
Background Prostate cancer screening among the general population is highly debatable. Nevertheless, screening among high-risk groups is appealing. Prior data suggests that men carrying mutations in the BRCA1& 2 genes may be at increased risk of developing prostate cancer. Additionally, they appear to develop prostate cancer at a younger age and with a more aggressive course. However, prior studies did not systematically perform prostate biopsies and thus cannot determine the true prevalence of prostate cancer in this population. Methods This will be a prospective diagnostic trial of screening for prostate cancer among men with genetic predisposition. The target population is males (40–70 year old) carrying a BRCA1 and/or BRCA2 germ line mutation. They will be identified via our Genetic counseling unit. All men after signing an informed consent will undergo the following tests: PSA, free to total PSA, MRI of prostate and prostate biopsy. The primary endpoint will be to estimate the prevalence, stage and grade of prostate cancer in this population. Additionally, the study aims to estimate the impact of these germ line mutations on benign prostatic hyperplasia. Furthermore, this study aims to create a bio-bank of tissue, urine and serum of this unique cohort for future investigations. Finally, this study will identify an inception cohort for future interventional studies of primary and secondary prevention. Discussion The proposed research is highly translational and focuses not only on the clinical results, but on the future specimens that will be used to advance our understanding of prostate cancer patho-physiology. Most importantly, these high-risk germ-line mutation carriers are ideal candidates for primary and secondary prevention initiatives. Trial registration ClinicalTrials.gov: NCT02053805.
Collapse
Affiliation(s)
- David Margel
- Division of Urology, Rabin Medical Center, Beilinson Campus, Petah-Tikva, Israel.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Collins J, Huynh M. Estimation of diagnostic test accuracy without full verification: a review of latent class methods. Stat Med 2014; 33:4141-69. [PMID: 24910172 DOI: 10.1002/sim.6218] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Revised: 05/02/2014] [Accepted: 05/05/2014] [Indexed: 11/09/2022]
Abstract
The performance of a diagnostic test is best evaluated against a reference test that is without error. For many diseases, this is not possible, and an imperfect reference test must be used. However, diagnostic accuracy estimates may be biased if inaccurately verified status is used as the truth. Statistical models have been developed to handle this situation by treating disease as a latent variable. In this paper, we conduct a systematized review of statistical methods using latent class models for estimating test accuracy and disease prevalence in the absence of complete verification.
Collapse
Affiliation(s)
- John Collins
- Rehabilitation Medicine Department, National Institutes of Health, Bethesda MD 20892, U.S.A
| | | |
Collapse
|
9
|
Whiting PF, Rutjes AWS, Westwood ME, Mallett S. A systematic review classifies sources of bias and variation in diagnostic test accuracy studies. J Clin Epidemiol 2013; 66:1093-104. [PMID: 23958378 DOI: 10.1016/j.jclinepi.2013.05.014] [Citation(s) in RCA: 190] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 05/08/2013] [Accepted: 05/15/2013] [Indexed: 11/15/2022]
Abstract
OBJECTIVE To classify the sources of bias and variation and to provide an updated summary of the evidence of the effects of each source of bias and variation. STUDY DESIGN AND SETTING We conducted a systematic review of studies of any design with the main objective of addressing bias or variation in the results of diagnostic accuracy studies. We searched MEDLINE, EMBASE, BIOSIS, the Cochrane Methodology Register, and Database of Abstracts of Reviews of Effects (DARE) from 2001 to October 2011. Citation searches based on three key papers were conducted, and studies from our previous review (search to 2001) were eligible. One reviewer extracted data on the study design, objective, sources of bias and/or variation, and results. A second reviewer checked the extraction. RESULTS We summarized the number of studies providing evidence of an effect arising from each source of bias and variation on the estimates of sensitivity, specificity, and overall accuracy. CONCLUSIONS We found consistent evidence for the effects of case-control design, observer variability, availability of clinical information, reference standard, partial and differential verification bias, demographic features, and disease prevalence and severity. Effects were generally stronger for sensitivity than for specificity. Evidence for other sources of bias and variation was limited.
Collapse
Affiliation(s)
- Penny F Whiting
- Kleijnen Systematic Reviews Ltd, Unit 6, Escrick Business Park, Riccall Road, Escrick, York YO19 6FD, United Kingdom.
| | | | | | | | | |
Collapse
|
10
|
Abbey CK, Eckstein MP, Boone JM. Estimating the relative utility of screening mammography. Med Decis Making 2013; 33:510-20. [PMID: 23295543 DOI: 10.1177/0272989x12470756] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND The concept of diagnostic utility is a fundamental component of signal detection theory, going back to some of its earliest works. Attaching utility values to the various possible outcomes of a diagnostic test should, in principle, lead to meaningful approaches to evaluating and comparing such systems. However, in many areas of medical imaging, utility is not used because it is presumed to be unknown. METHODS In this work, we estimate relative utility (the utility benefit of a detection relative to that of a correct rejection) for screening mammography using its known relation to the slope of a receiver operating characteristic (ROC) curve at the optimal operating point. The approach assumes that the clinical operating point is optimal for the goal of maximizing expected utility and therefore the slope at this point implies a value of relative utility for the diagnostic task, for known disease prevalence. We examine utility estimation in the context of screening mammography using the Digital Mammographic Imaging Screening Trials (DMIST) data. RESULTS We show how various conditions can influence the estimated relative utility, including characteristics of the rating scale, verification time, probability model, and scope of the ROC curve fit. Relative utility estimates range from 66 to 227. CONCLUSIONS We argue for one particular set of conditions that results in a relative utility estimate of 162 (±14%). This is broadly consistent with values in screening mammography determined previously by other means. At the disease prevalence found in the DMIST study (0.59% at 365-day verification), optimal ROC slopes are near unity, suggesting that utility-based assessments of screening mammography will be similar to those found using Youden's index.
Collapse
Affiliation(s)
- Craig K Abbey
- Department of Psychology, University of California, Santa Barbara, CA (CKA, ME),Department of Radiology, UC Davis Medical Center, Sacramento, CA (CKA, JMB)
| | - Miguel P Eckstein
- Department of Psychology, University of California, Santa Barbara, CA (CKA, ME)
| | - John M Boone
- Department of Radiology, UC Davis Medical Center, Sacramento, CA (CKA, JMB)
| |
Collapse
|