51
|
Marinovich ML, Houssami N, Macaskill P, von Minckwitz G, Blohmer JU, Irwig L. Accuracy of ultrasound for predicting pathologic response during neoadjuvant therapy for breast cancer. Int J Cancer 2014; 136:2730-7. [PMID: 25387885 DOI: 10.1002/ijc.29323] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 10/23/2014] [Indexed: 12/11/2022]
Abstract
Early assessment of response to neoadjuvant chemotherapy (NAC) for breast cancer allows therapy to be tailored; however, optimal response assessment methods have not been established. We estimated the accuracy of ultrasound (US) to predict pathologic complete response (pCR) using common response criteria and pCR definitions, and estimated incremental accuracy over known prognostic variables. Participants undergoing US after two cycles in the GeparTrio trial randomised to no change in NAC were eligible. US response by World Health Organisation (WHO) criteria (1D or 2D) and Response Evaluation Criteria In Solid Tumours (RECIST) was assessed. Four pCR definitions were applied. Sensitivity (correct prediction of pCR), specificity (correct prediction of no-pCR) and diagnostic odds ratios (DORs) were calculated. Areas under the curve (AUCs) were derived from logistic regression including patient variables with and without US. In 832 patients, DORs decreased as pCR definitions became less stringent (p = 0.01). For WHO-2D, DORs were as follows: 4.07 (ypT0,ypN0), 3.75 (ypT0/is,ypN0), 3.14 (ypT0/is,ypN+/-) and 2.65 (ypT0/is/1a,ypN+/-). DORs did not differ between US criteria (p = 0.60). High sensitivity and lower specificity were found for WHO-2D and RECIST; WHO-1D was highly specific with low sensitivity. Sensitivity was highest for WHO-2D predicting ypT0,ypN0 (sensitivity = 81.7%, specificity = 47.6% vs. 42.3% and 80.4% for WHO-1D). Adding US to models including patient variables (age, T-stage, histology and subtype) improved AUCs for predicting pCR by 2-3%. In conclusion, US accuracy is highest for predicting ypT0,ypN0, shown to be most prognostic of long-term survival. WHO-2D and RECIST maximise sensitivity; WHO-1D maximises specificity. US modestly improves the prediction of pCR by patient characteristics.
Collapse
|
52
|
Hersch J, Jansen J, Barratt A, Irwig L, Houssami N, Jacklyn G, Thornton H, Dhillon H, McCaffery K. Overdetection in breast cancer screening: development and preliminary evaluation of a decision aid. BMJ Open 2014; 4:e006016. [PMID: 25256188 PMCID: PMC4179580 DOI: 10.1136/bmjopen-2014-006016] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/19/2014] [Accepted: 09/05/2014] [Indexed: 11/22/2022] Open
Abstract
OBJECTIVE To develop, pilot and refine a decision aid (ahead of a randomised trial evaluation) for women around age 50 facing their initial decision about whether to undergo mammography screening. DESIGN Two-stage mixed-method pilot study including qualitative interviews (n=15) and a randomised comparison using a quantitative survey (n=34). SETTING New South Wales, Australia. PARTICIPANTS Women aged 43-59 years with no personal history of breast cancer. INTERVENTIONS The decision aid provides evidence-based information about important outcomes of mammography screening over 20 years (breast cancer mortality reduction, overdetection and false positives) compared with no screening. The information is presented in a short booklet for women, combining text and visual formats. A control version produced for the purposes of comparison omits the overdetection-related content. OUTCOMES Comprehension of key decision aid content and acceptability of the materials. RESULTS Most women considered the decision aid clear and helpful and would recommend it to others. Nonetheless, the piloting process raised important issues that we tried to address in iterative revisions. Some participants found it hard to understand overdetection and why it is of concern, while there was often confusion about the distinction between overdetection and false positives. In a screening context, encountering balanced information rather than persuasion appears to be contrary to people's expectations, but women appreciated the opportunity to become better informed. CONCLUSIONS The concept of overdetection is complex and new to the public. This study highlights some key challenges for communicating about this issue. It is important to clarify that overdetection differs from false positives in terms of its more serious consequences (overtreatment and associated harms). Screening decision aids also must clearly explain their purpose of facilitating informed choice. A staged approach to development and piloting of decision aids is recommended to further improve understanding of overdetection and support informed decision-making about screening.
Collapse
|
53
|
Downie A, Williams CM, Henschke N, Hancock MJ, Ostelo RWJG, de Vet HCW, Macaskill P, Irwig L, van Tulder MW, Koes BW, Maher CG. Red flags to screen for malignancy and fracture in patients with low back pain:. Br J Sports Med 2014; 48:1518. [DOI: 10.1136/bjsports-2014-f7095rep] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
54
|
Bell KJL, Beller E, Sundström J, McGeechan K, Hayen A, Irwig L, Neal B, Glasziou P. Ambulatory blood pressure adds little to Framingham Risk Score for the primary prevention of cardiovascular disease in older men: secondary analysis of observational study data. BMJ Open 2014; 4:e006044. [PMID: 25200562 PMCID: PMC4158214 DOI: 10.1136/bmjopen-2014-006044] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
OBJECTIVE To determine the incremental value of ambulatory blood pressure (BP) in predicting cardiovascular risk when the Framingham Risk Score (FRS) is known. METHODS We included 780 men without cardiovascular disease from the Uppsala Longitudinal Study of Adult Men, all aged approximately 70 years at baseline. We first screened ambulatory systolic BP (ASBP) parameters for their incremental value by adding them to a model with 10-year FRS. For the best ASBP parameter we estimated HRs and changes in discrimination, calibration and reclassification. We also estimated the difference in the number of men started on treatment and in the number of men protected against a cardiovascular event. RESULTS Mean daytime ASBP had the highest incremental value; adding other parameters did not yield further improvements. While ASBP was an independent risk factor for cardiovascular disease, addition to FRS led to only small increases to the overall model fit, discrimination (a 1% increase in the area under the receiver operating characteristic (ROC) curve), calibration and reclassification. We estimated that for every 10,000 men screened with ASBP, 141 fewer would start a new BP-lowering treatment (95% CI 62 to 220 less treated), but this would result in 7 fewer cardiovascular events prevented over the subsequent 10 years (95% CI 21 fewer events prevented to 7 more events prevented). CONCLUSIONS In addition to a standard cardiovascular risk assessment it is not clear that ambulatory BP measurement provides further incremental value. The clinical role of ambulatory BP requires ongoing careful consideration.
Collapse
|
55
|
Bonner C, Jansen J, McKinn S, Irwig L, Doust J, Glasziou P, McCaffery K. How do general practitioners and patients make decisions about cardiovascular disease risk? Health Psychol 2014; 34:253-61. [PMID: 25133842 DOI: 10.1037/hea0000122] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE Although current guidelines around the world recommend using absolute risk (AR) thresholds to decide whether cardiovascular disease (CVD) risk should be managed with lifestyle or medication, the use of AR in clinical practice is limited. The aim of this study was to explore the factors that influence general practitioner (GP) and patient decision making about CVD risk management, including the role of risk perception. METHODS Qualitative descriptive study involving semi-structured interviews with 25 GPs and 38 patients in Australia in 2011-2012. Transcribed audio-recordings were thematically coded and a Framework Analysis method was used. RESULTS GPs rarely mentioned AR thresholds but were influenced by their subjective perception of the patient's risk and motivation, and their own attitudes toward prevention, including concerns about medication side effects and the efficacy of lifestyle change. Patients were influenced by individual risk factors, their own motivation to change lifestyle, and attitudes toward medication: initially negative, but this improved if medication was more effective than lifestyle. High perceived risk led to medication being recommended by GPs and accepted by patients, but this was not necessarily based on AR. Patient perceptions of high risk also increased motivation to change lifestyle, particularly if they were resistant to the idea of taking medication. CONCLUSIONS Perceived risk, motivation, and attitudes appeared to be more important than AR thresholds in this study. CVD risk management guidelines could be more useful if they include strategies to help GPs consider patients' risk perception, motivation, and attitudes as well as evidence-based recommendations.
Collapse
|
56
|
McGeechan K, Macaskill P, Irwig L, Bossuyt PMM. An assessment of the relationship between clinical utility and predictive ability measures and the impact of mean risk in the population. BMC Med Res Methodol 2014; 14:86. [PMID: 24989719 PMCID: PMC4105158 DOI: 10.1186/1471-2288-14-86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2013] [Accepted: 06/26/2014] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Measures of clinical utility (net benefit and event free life years) have been recommended in the assessment of a new predictor in a risk prediction model. However, it is not clear how they relate to the measures of predictive ability and reclassification, such as the c-statistic and Net Reclassification Improvement (NRI), or how these measures are affected by differences in mean risk between populations when a fixed cutpoint to define high risk is assumed. METHODS We examined the relationship between measures of clinical utility (net benefit, event free life years) and predictive ability (c-statistic, binary c-statistic, continuous NRI(0), NRI with two cutpoints, binary NRI) using simulated data and the Framingham dataset. RESULTS In the analysis of simulated data, the addition of a new predictor tended to result in more people being treated when the mean risk was less than the cutpoint, and fewer people being treated for mean risks beyond the cutpoint. The reclassification and clinical utility measures showed similar relationships with mean risk when the mean risk was less than the cutpoint and the baseline model was not strong. However, when the mean risk was greater than the cutpoint, or the baseline model was strong, the reclassification and clinical utility measures diverged in their relationship with mean risk.Although the risk of CVD was lower for women compared to men in the Framingham dataset, the measures of predictive ability, reclassification and clinical utility were both larger for women. The difference in these results was, in part, due to the larger hazard ratio associated with the additional risk predictor (systolic blood pressure) for women. CONCLUSION Measures such as the c-statistic and the measures of reclassification do not capture the consequences of implementing different prediction models. We do not recommend their use in evaluating which new predictors may be clinically useful in a particular population. We recommend that a measure such as net benefit or EFLY is calculated and, where appropriate, the measure is weighted to account for differences in the distribution of risks between the study population and the population in which the new predictors will be implemented.
Collapse
|
57
|
Bonner C, Jansen J, McKinn S, Irwig L, Doust J, Glasziou P, McCaffery K. Communicating cardiovascular disease risk: an interview study of General Practitioners' use of absolute risk within tailored communication strategies. BMC FAMILY PRACTICE 2014; 15:106. [PMID: 24885409 PMCID: PMC4042137 DOI: 10.1186/1471-2296-15-106] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 05/21/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND Cardiovascular disease (CVD) prevention guidelines encourage assessment of absolute CVD risk - the probability of a CVD event within a fixed time period, based on the most predictive risk factors. However, few General Practitioners (GPs) use absolute CVD risk consistently, and communication difficulties have been identified as a barrier to changing practice. This study aimed to explore GPs' descriptions of their CVD risk communication strategies, including the role of absolute risk. METHODS Semi-structured interviews were conducted with a purposive sample of 25 GPs in New South Wales, Australia. Transcribed audio-recordings were thematically coded, using the Framework Analysis method to ensure rigour. RESULTS GPs used absolute CVD risk within three different communication strategies: 'positive', 'scare tactic', and 'indirect'. A 'positive' strategy, which aimed to reassure and motivate, was used for patients with low risk, determination to change lifestyle, and some concern about CVD risk. Absolute risk was used to show how they could reduce risk. A 'scare tactic' strategy was used for patients with high risk, lack of motivation, and a dismissive attitude. Absolute risk was used to 'scare' them into taking action. An 'indirect' strategy, where CVD risk was not the main focus, was used for patients with low risk but some lifestyle risk factors, high anxiety, high resistance to change, or difficulty understanding probabilities. Non-quantitative absolute risk formats were found to be helpful in these situations. CONCLUSIONS This study demonstrated how GPs use three different communication strategies to address the issue of CVD risk, depending on their perception of patient risk, motivation and anxiety. Absolute risk played a different role within each strategy. Providing GPs with alternative ways of explaining absolute risk, in order to achieve different communication aims, may improve their use of absolute CVD risk assessment in practice.
Collapse
|
58
|
Marinovich ML, Houssami N, Macaskill P, Von Minckwitz G, Blohmer JU, Irwig L. Accuracy of ultrasound during neoadjuvant therapy for breast cancer to predict pathologic response. J Clin Oncol 2014. [DOI: 10.1200/jco.2014.32.15_suppl.1089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
59
|
Jansen J, Bonner C, McKinn S, Irwig L, Glasziou P, Doust J, Teixeira-Pinto A, Hayen A, Turner R, McCaffery K. General practitioners' use of absolute risk versus individual risk factors in cardiovascular disease prevention: an experimental study. BMJ Open 2014; 4:e004812. [PMID: 24833688 PMCID: PMC4025465 DOI: 10.1136/bmjopen-2014-004812] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Revised: 04/14/2014] [Accepted: 04/22/2014] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To understand general practitioners' (GPs) use of individual risk factors (blood pressure and cholesterol levels) versus absolute risk in cardiovascular disease (CVD) risk management decision-making. DESIGN Randomised experiment. Absolute risk, systolic blood pressure (SBP), cholesterol ratio (total cholesterol/high-density lipoprotein (TC/HDL)) and age were systematically varied in hypothetical cases. High absolute risk was defined as 5-year risk of a cardiovascular event >15%, high blood pressure levels varied between SBP 147 and 179 mm Hg and high cholesterol (TC/HDL ratio) between 6.5 and 7.2 mmol/L. SETTING 4 GP conferences in Australia. PARTICIPANTS 144 Australian GPs. OUTCOMES GPs indicated whether they would prescribe cholesterol and/or blood pressure lowering medication. Analyses involved logistic regression. RESULTS For patients with high blood pressure: 93% (95% CI 86% to 96%) of high absolute risk patients and 83% (95% CI 76% to 88%) of lower absolute risk patients were prescribed blood pressure medication. Conversely, 30% (95% CI 25% to 36%) of lower blood pressure patients were prescribed blood pressure medication if absolute risk was high and 4% (95% CI 3% to 5%) if lower. 69% of high cholesterol/high absolute risk patients were prescribed cholesterol medication (95% CI 61% to 77%) versus 34% of high cholesterol/lower absolute risk patients (95% CI 28% to 41%). 36% of patients with lower cholesterol (95% CI 30% to 43%) were prescribed cholesterol medication if absolute risk was high versus 10% if lower (95% CI 8% to 13%). CONCLUSIONS GPs' decision-making was more consistent with the management of individual risk factors than an absolute risk approach, especially when prescribing blood pressure medication. The results suggest medical treatment of lower risk patients (5-year risk of CVD event <15%) with mildly elevated blood pressure or cholesterol levels is likely to occur even when an absolute risk assessment is specifically provided. The results indicate a need for improving uptake of absolute risk guidelines and GP understanding of the rationale for using absolute risk.
Collapse
|
60
|
Hersch J, Barratt A, Jansen J, Houssami N, Irwig L, Jacklyn G, Dhillon H, Thornton H, McGeechan K, Howard K, McCaffery K. The effect of information about overdetection of breast cancer on women's decision-making about mammography screening: study protocol for a randomised controlled trial. BMJ Open 2014; 4:e004990. [PMID: 24833692 PMCID: PMC4025472 DOI: 10.1136/bmjopen-2014-004990] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
INTRODUCTION Women are largely unaware that mammography screening can cause overdetection of inconsequential disease, leading to overdiagnosis and overtreatment of breast cancer. Evidence is lacking about how information on overdetection affects women's breast screening decisions and experiences. This study investigates the consequences of providing information about overdetection of breast cancer to women approaching the age of invitation to mammography screening. METHODS AND ANALYSIS This is a randomised controlled trial with an embedded longitudinal qualitative substudy. Participants are a community sample of women aged 48-50 in New South Wales, Australia, recruited in 2014. Women are randomly allocated to either quantitative only follow-up (n=904) or additional qualitative follow-up (n=66). Women in each stream are then randomised to receive either the intervention (evidence-based information booklet including overdetection, breast cancer mortality reduction and false positives) or a control information booklet (including mortality reduction and false positives only). The primary outcome is informed choice about breast screening (adequate knowledge, and consistency between attitudes and intentions) assessed via telephone interview at 2 weeks postintervention. Secondary outcomes measured at this time include decision process (decisional conflict and confidence) and psychosocial outcomes (anticipated regret, anxiety, breast cancer worry and perceived risk). Women are further followed up at 6 months, 1 and 2 years to assess self-reported screening behaviour and long-term psychosocial outcomes (decision regret, quality of life). Participants in the qualitative stream undergo additional in-depth interviews at each time point to explore the views and experiences of women who do and do not choose to have screening. ETHICS AND DISSEMINATION The study has ethical approval, and results will be published in peer-reviewed journals. This research will help ensure that information about overdetection may be communicated clearly and effectively, using an evidence-based approach, to women considering breast cancer screening. TRIAL REGISTRATION NUMBER Australian New Zealand Clinical Trials Registry ACTRN12613001035718.
Collapse
|
61
|
Bonner C, Jansen J, Newell BR, Irwig L, Glasziou P, Doust J, Dhillon H, McCaffery K. I don't believe it, but i'd better do something about it: patient experiences of online heart age risk calculators. J Med Internet Res 2014; 16:e120. [PMID: 24797339 PMCID: PMC4026572 DOI: 10.2196/jmir.3190] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 03/19/2014] [Accepted: 04/14/2014] [Indexed: 11/13/2022] Open
Abstract
Background Health risk calculators are widely available on the Internet, including cardiovascular disease (CVD) risk calculators that estimate the probability of a heart attack, stroke, or death over a 5- or 10-year period. Some calculators convert this probability to “heart age”, where a heart age older than current age indicates modifiable risk factors. These calculators may impact patient decision making about CVD risk management with or without clinician involvement, but little is known about how patients use them. Previous studies have not investigated patient understanding of heart age compared to 5-year percentage risk, or the best way to present heart age. Objective This study aimed to investigate patient experiences and understanding of online heart age calculators that use different verbal, numerical, and graphical formats, based on 5- and 10-year Framingham risk equations used in clinical practice guidelines around the world. Methods General practitioners in New South Wales, Australia, recruited 26 patients with CVD/lifestyle risk factors who were not taking cholesterol or blood pressure-lowering medication in 2012. Participants were asked to “think aloud” while using two heart age calculators in random order, with semi-structured interviews before and after. Transcribed audio recordings were coded and a framework analysis method was used. Results Risk factor questions were often misinterpreted, reducing the accuracy of the calculators. Participants perceived older heart age as confronting and younger heart age as positive but unrealistic. Unexpected or contradictory results (eg, low percentage risk but older heart age) led participants to question the credibility of the calculators. Reasons to discredit the results included the absence of relevant lifestyle questions and impact of corporate sponsorship. However, the calculators prompted participants to consider lifestyle changes irrespective of whether they received younger, same, or older heart age results. Conclusions Online heart age calculators can be misunderstood and disregarded if they produce unexpected or contradictory results, but they may still motivate lifestyle changes. Future research should investigate both the benefits and harms of communicating risk in this way, and how to increase the reliability and credibility of online health risk calculators.
Collapse
|
62
|
Turner RM, Walter SD, Macaskill P, McCaffery KJ, Irwig L. Sample Size and Power When Designing a Randomized Trial for the Estimation of Treatment, Selection, and Preference Effects. Med Decis Making 2014; 34:711-9. [PMID: 24695962 DOI: 10.1177/0272989x14525264] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 05/02/2013] [Indexed: 11/15/2022]
Abstract
BACKGROUND A 2-stage randomized trial design, incorporating participant choice, provides unbiased estimates of the effects of the treatment or intervention (treatment effect), the difference between outcomes for participants who prefer one treatment compared with another (selection effect), and the interaction between participants' preferences for treatment and the treatment actually received (preference effect). It is important to ensure that such trials are adequately powered to estimate these effects. SAMPLE SIZE FORMULAS This paper presents methods for determining the required sample sizes for estimating treatment, selection, and preference effects. We demonstrate the changes in sample size as various key parameters are changed. In general, approximately twice as many participants (in total) are needed to have equivalent power for detecting both treatment and selection/preference effects compared with a trial of the treatment effect alone. PRIMARY SCREENING EXAMPLE We illustrate their application for the design of a primary screening trial comparing human papillomavirus DNA testing versus cervical screening (by Pap smear). Our example would require 520 participants to have 80% power to detect moderate-sized preference and selection effects and a small to moderate treatment effect. CONCLUSIONS With the growing interest in understanding treatment choices and with the use of decision aids, well-designed and adequately powered 2-stage randomized trial designs offer the opportunity to determine the effects of participants' preferences. Our sample size formulas will help future studies ensure that they have adequate power to detect selection and preference effects.
Collapse
|
63
|
Bell KJ, Beller E, Sundström J, McGeechan K, Hayen A, Irwig L, Neal B, Glasziou P. PT266 The Incremental Value of Ambulatory Blood Pressure For The Primary Prevention of Cardiovascular Disease In Older Men. Glob Heart 2014. [DOI: 10.1016/j.gheart.2014.03.2013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
64
|
Glasziou PP, Irwig L, Kirby AC, Tonkin AM, Simes RJ. Which lipid measurement should we monitor? An analysis of the LIPID study. BMJ Open 2014; 4:e003512. [PMID: 24561494 PMCID: PMC3931993 DOI: 10.1136/bmjopen-2013-003512] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 01/20/2014] [Accepted: 01/21/2014] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVES To evaluate the optimal lipid to measure in monitoring patients, we assessed three factors that influence the choice of monitoring tests: (1) clinical validity; (2) responsiveness to therapy changes and (3) the size of the long-term 'signal-to-noise' ratio. DESIGN Longitudinal analyses of repeated lipid measurement over 5 years. SETTING Subsidiary analysis of a Long-Term Intervention with Pravastatin in Ischaemic Disease (LIPID) study-a clinical trial in Australia, New Zealand and Finland. PARTICIPANTS 9014 patients aged 31-75 years with previous acute coronary syndromes. INTERVENTIONS Patients were randomly assigned to 40 mg daily pravastatin or placebo. PRIMARY AND SECONDARY OUTCOME MEASURES We used data on serial lipid measurements-at randomisation, 6 months and 12 months, and then annually to 5 years-of total cholesterol; low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and their ratios; triglycerides; and apolipoproteins A and B and their ratio and their ability to predict coronary events. RESULTS All the lipid measures were statistically significantly associated with future coronary events, but the associations between each of the three ratio measures (total or LDL cholesterol to HDL cholesterol, and apolipoprotein B to apolipoprotein A1) and the time to a coronary event were better than those for any of the single lipid measures. The two cholesterol ratios also ranked highly for the long-term signal-to-noise ratios. However, LDL cholesterol and non-HDL cholesterol showed the most responsiveness to treatment change. CONCLUSIONS Lipid monitoring is increasingly common, but current guidelines vary. No single measure was best on all three criteria. Total cholesterol did not rank highly on any single criterion. However, measurements based on cholesterol subfractions-non-HDL cholesterol (total cholesterol minus HDL cholesterol) and the two ratios-appeared superior to total cholesterol or any of the apolipoprotein options. Guidelines should consider using non-HDL cholesterol or a ratio measure for initial treatment decisions and subsequent monitoring.
Collapse
|
65
|
Downie A, Williams CM, Henschke N, Hancock MJ, Ostelo RWJG, de Vet HCW, Macaskill P, Irwig L, van Tulder MW, Koes BW, Maher CG. Red flags to screen for malignancy and fracture in patients with low back pain: systematic review. BMJ 2013; 347:f7095. [PMID: 24335669 PMCID: PMC3898572 DOI: 10.1136/bmj.f7095] [Citation(s) in RCA: 157] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/18/2013] [Indexed: 02/07/2023]
Abstract
OBJECTIVE To review the evidence on diagnostic accuracy of red flag signs and symptoms to screen for fracture or malignancy in patients presenting with low back pain to primary, secondary, or tertiary care. DESIGN Systematic review. DATA SOURCES Medline, OldMedline, Embase, and CINAHL from earliest available up to 1 October 2013. INCLUSION CRITERIA Primary diagnostic studies comparing red flags for fracture or malignancy to an acceptable reference standard, published in any language. REVIEW METHODS Assessment of study quality and extraction of data was conducted by three independent assessors. Diagnostic accuracy statistics and post-test probabilities were generated for each red flag. RESULTS We included 14 studies (eight from primary care, two from secondary care, four from tertiary care) evaluating 53 red flags; only five studies evaluated combinations of red flags. Pooling of data was not possible because of index test heterogeneity. Many red flags in current guidelines provide virtually no change in probability of fracture or malignancy or have untested diagnostic accuracy. The red flags with the highest post-test probability for detection of fracture were older age (9%, 95% confidence interval 3% to 25%), prolonged use of corticosteroid drugs (33%, 10% to 67%), severe trauma (11%, 8% to 16%), and presence of a contusion or abrasion (62%, 49% to 74%). Probability of spinal fracture was higher when multiple red flags were present (90%, 34% to 99%). The red flag with the highest post-test probability for detection of spinal malignancy was history of malignancy (33%, 22% to 46%). CONCLUSIONS While several red flags are endorsed in guidelines to screen for fracture or malignancy, only a small subset of these have evidence that they are indeed informative. These findings suggest a need for revision of many current guidelines.
Collapse
|
66
|
Bonner C, Jansen J, McKinn S, Irwig L, Doust J, Glasziou P, Hayen A, McCaffery K. General practitioners' use of different cardiovascular risk assessment strategies: a qualitative study. Med J Aust 2013; 199:485-9. [PMID: 24099210 DOI: 10.5694/mja13.10133] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Accepted: 08/05/2013] [Indexed: 11/17/2022]
Abstract
OBJECTIVES To identify factors that influence the extent to which general practitioners use absolute risk (AR) assessment in cardiovascular disease (CVD) risk assessment. DESIGN, SETTING AND PARTICIPANTS Semi-structured interviews with 25 currently practising GPs from eight Divisions of General Practice in New South Wales, Australia, between October 2011 and May 2012. Data were analysed using framework analysis. RESULTS The study identified five strategies that GPs use with patients in different situations, defined in terms of the extent to which AR was used and the reasons given for this: the AR-focused strategy, used when AR assessment was considered useful for the patient; the AR-adjusted strategy, used to account for additional risk factors such as family history; the clinical judgement strategy, used when GPs considered that their judgement took multiple risk factors into account as effectively as AR; the passive disregard strategy, used when GPs lacked sufficient time, access or experience to use AR; and the active disregard strategy, used when AR was considered to be inappropriate for the patient. The strategies were linked with different opportunity, capability and motivation barriers to the use of AR. CONCLUSIONS This study provides an in-depth insight into the factors that influence GPs' use of AR in CVD risk assessment. The results suggest that GPs use a range of strategies in different situations, so different approaches may be required to improve the use of AR guidelines in practice.
Collapse
|
67
|
Bell KJL, Glasziou PP, Hayen A, Irwig L. Criteria for monitoring tests were described: validity, responsiveness, detectability of long-term change, and practicality. J Clin Epidemiol 2013; 67:152-9. [PMID: 24189088 DOI: 10.1016/j.jclinepi.2013.07.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Revised: 07/10/2013] [Accepted: 07/19/2013] [Indexed: 10/26/2022]
Abstract
OBJECTIVES To describe how evidence from trials and cohort studies may be used to guide choice of test for monitoring patients with chronic disease. STUDY DESIGN AND SETTING Exploration of potential criteria for choosing the best monitoring test. Criteria are defined and options for assessment measures for test performance on each criterion discussed. RESULTS Monitoring in clinical practice occurs in three main phases: before treatment, response to treatment, and long-term monitoring. Four important criteria may be used to choose the best test for monitoring a patient in each of these phases. Clinical validity describes the ability of the test to predict the clinically relevant outcome that we are trying to control or prevent. Responsiveness describes how much the test changes in response to an intervention relative to background random variation. Detectability of long-term change describes the size of changes in the test over the long term relative to background random variation. Practicality describes the ease of use, invasiveness, and cost of the test. Test performance generally requires longitudinal data from trial and/or cohort studies using statistical methods such as those discussed. CONCLUSION Four specific criteria can help clinicians inform evidence-based decisions on which monitoring test to use.
Collapse
|
68
|
Lucas N, Macaskill P, Irwig L, Moran R, Rickards L, Turner R, Bogduk N. The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL). BMC Med Res Methodol 2013; 13:111. [PMID: 24010406 PMCID: PMC3847619 DOI: 10.1186/1471-2288-13-111] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 09/05/2013] [Indexed: 12/01/2022] Open
Abstract
Background The aim of this project was to investigate the reliability of a new 11-item quality appraisal tool for studies of diagnostic reliability (QAREL). The tool was tested on studies reporting the reliability of any physical examination procedure. The reliability of physical examination is a challenging area to study given the complex testing procedures, the range of tests, and lack of procedural standardisation. Methods Three reviewers used QAREL to independently rate 29 articles, comprising 30 studies, published during 2007. The articles were identified from a search of relevant databases using the following string: “Reproducibility of results (MeSH) OR reliability (t.w.) AND Physical examination (MeSH) OR physical examination (t.w.).” A total of 415 articles were retrieved and screened for inclusion. The reviewers undertook an independent trial assessment prior to data collection, followed by a general discussion about how to score each item. At no time did the reviewers discuss individual papers. Reliability was assessed for each item using multi-rater kappa (κ). Results Multi-rater reliability estimates ranged from κ = 0.27 to 0.92 across all items. Six items were recorded with good reliability (κ > 0.60), three with moderate reliability (κ = 0.41 - 0.60), and two with fair reliability (κ = 0.21 - 0.40). Raters found it difficult to agree about the spectrum of patients included in a study (Item 1) and the correct application and interpretation of the test (Item 10). Conclusions In this study, we found that QAREL was a reliable assessment tool for studies of diagnostic reliability when raters agreed upon criteria for the interpretation of each item. Nine out of 11 items had good or moderate reliability, and two items achieved fair reliability. The heterogeneity in the tests included in this study may have resulted in an underestimation of the reliability of these two items. We discuss these and other factors that could affect our results and make recommendations for the use of QAREL.
Collapse
|
69
|
Rychetnik L, Carter SM, Barratt A, Irwig L. Expanding the evidence on cancer screening: the value of scientific, social and ethical perspectives. Med J Aust 2013; 198:536-9. [PMID: 23725267 DOI: 10.5694/mja12.11275] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
We propose an expanded approach to evidence for cancer screening policy and practice. First, we need to better understand why and how screening happens the way it does, sometimes at odds with evidence of benefits and harms. Second, we need to systematically investigate the ethics of cancer screening to illuminate moral concerns and expand the scope of screening research to address ethical dilemmas. An expanded approach will offer essential information to better support well reasoned judgements, and develop more accountable and less contested cancer screening policies.
Collapse
|
70
|
Bell KJL, Hayen A, Irwig L, Takahashi O, Ohde S, Glasziou P. When to remeasure cardiovascular risk in untreated people at low and intermediate risk: observational study. BMJ 2013; 346:f1895. [PMID: 23553971 DOI: 10.1136/bmj.f1895] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
OBJECTIVE To estimate the probability of becoming high risk for cardiovascular disease among people at low and intermediate risk and not being treated for high blood pressure or lipid levels. DESIGN Observational study. SETTING General communities in Japan and the United States. PARTICIPANTS 13,757 participants of the Tokyo health check-up study and 3855 of the Framingham studies aged 30-74 years with complete data on risk equation covariates, not receiving blood pressure or cholesterol lowering treatment, and with an estimated risk of cardiovascular disease <20% within 10 years. We stratified participants on the basis of baseline risk: <5%, 5-<10%, 10-<15%, and 15-<20%.We used follow-up measurements from the Tokyo study done annually over three years (2006-10) and follow-up visits in the Framingham study done between eight (1968-75) and 19 years (1990-1995) after baseline. MAIN OUTCOME MEASURE Estimated 10 year risk of a cardiovascular event >20% using the Framingham equation. RESULTS At baseline most participants had <5% risk (60.6% of Tokyo cohort and 45.7% of Framingham cohort) or 5-<10% risk (24.0% and 28.0%, respectively) of a cardiovascular event within 10 years. There was <10% probability of crossing the treatment threshold at 19, 8, and 3 years for baseline risk groups <5%, 5-<10%, and 10-<15%, respectively, and >10% probability of crossing the treatment threshold at one year for the 15-<20% baseline risk group. CONCLUSIONS Decisions on the frequency of remeasuring for cardiovascular risk should be made on the basis of baseline risk. Repeat risk estimation before 8-10 years is not warranted for most people initially not requiring treatment. However, remeasurement within a year seems warranted in those with an initial 15-<20% risk.
Collapse
|
71
|
Houssami N, Abraham LA, Kerlikowske K, Buist DSM, Irwig L, Lee J, Miglioretti DL. Risk factors for second screen-detected or interval breast cancers in women with a personal history of breast cancer participating in mammography screening. Cancer Epidemiol Biomarkers Prev 2013; 22:946-61. [PMID: 23513042 DOI: 10.1158/1055-9965.epi-12-1208-t] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Women with a personal history of breast cancer (PHBC) have increased risk of an interval cancer. We aimed to identify risk factors for second (ipsilateral or contralateral) screen-detected or interval breast cancer within 1 year of screening in PHBC women. METHODS Screening mammograms from women with history of early-stage breast cancer at Breast Cancer Surveillance Consortium-affiliated facilities (1996-2008) were examined. Associations between woman-level, screen-level, and first cancer variables and the probability of a second breast cancer were modeled using multinomial logistic regression for three outcomes [screen-detected invasive breast cancer, interval invasive breast cancer, or ductal carcinoma in situ (DCIS)] relative to no second breast cancer. RESULTS There were 697 second breast cancers, of these 240 were interval cancers, among 67,819 screens in 20,941 women. In separate models for women with DCIS or invasive first cancer, first breast cancer surgery predicted all three second breast cancer outcomes (P < 0.001), and high ORs for second breast cancers (between 1.95 and 4.82) were estimated for breast conservation without radiation (relative to mastectomy). In women with invasive first breast cancer, additional variables predicted risk (P < 0.05) for at least one of the three outcomes: first-degree family history, dense breasts, longer time between mammograms, young age at first breast cancer, first breast cancer stage, and adjuvant systemic therapy for first breast cancer; and risk of interval invasive breast cancer was highest in women <40 years at first breast cancer (OR, 3.41; 1.34-8.70), those with extremely dense breasts (OR, 2.55; 1.4-4.67), and those treated with breast conservation without radiation (OR, 2.67; 1.53-4.65). CONCLUSION Although the risk of a second breast cancer is modest, our models identify risk factors for interval second breast cancer in PHBC women. IMPACT Our findings may guide discussion and evaluations of tailored breast screening in PHBC women, and incorporating this information into clinical decision-making warrants further research.
Collapse
|
72
|
Henschke N, Maher CG, Ostelo RWJG, de Vet HCW, Macaskill P, Irwig L. Red flags to screen for malignancy in patients with low-back pain. Cochrane Database Syst Rev 2013; 2013:CD008686. [PMID: 23450586 PMCID: PMC10631455 DOI: 10.1002/14651858.cd008686.pub2] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
BACKGROUND The identification of serious pathologies, such as spinal malignancy, is one of the primary purposes of the clinical assessment of patients with low-back pain (LBP). Clinical guidelines recommend awareness of "red flag" features from the patient's clinical history and physical examination to achieve this. However, there are limited empirical data on the diagnostic accuracy of these features and there remains very little information on how best to use them in clinical practice. OBJECTIVES To assess the diagnostic performance of clinical characteristics identified by taking a clinical history and conducting a physical examination ("red flags") to screen for spinal malignancy in patients presenting with LBP. SEARCH METHODS We searched electronic databases for primary studies (MEDLINE, EMBASE, and CINAHL) and systematic reviews (PubMed and Medion) from the earliest date until 1 April 2012. Forward and backward citation searching of eligible articles was also performed. SELECTION CRITERIA We considered studies if they compared the results of history taking and physical examination on patients with LBP with those of diagnostic imaging (magnetic resonance imaging, computed tomography, myelography). DATA COLLECTION AND ANALYSIS Two review authors independently assessed the quality of each included study with the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) tool and extracted details on patient characteristics, study design, index tests, and reference standard. Diagnostic accuracy data were presented as sensitivities and specificities with 95% confidence intervals for all index tests. MAIN RESULTS We included eight cohort studies of which six were performed in primary care (total number of patients; n = 6622), one study was from an accident and emergency setting (n = 482), and one study was from a secondary care setting (n = 257). In the six primary care studies, the prevalence of spinal malignancy ranged from 0% to 0.66%. Overall, data from 20 index tests were extracted and presented, however only seven of these were evaluated by more than one study. Because of the limited number of studies and clinical heterogeneity, statistical pooling of diagnostic accuracy data was not performed.There was some evidence from individual studies that having a previous history of cancer meaningfully increases the probability of malignancy. Most "red flags" such as insidious onset, age > 50, and failure to improve after one month have high false positive rates.All of the tests were evaluated in isolation and no study presented data on a combination of positive tests to identify spinal malignancy. AUTHORS' CONCLUSIONS For most "red flags," there is insufficient evidence to provide recommendations regarding their diagnostic accuracy or usefulness for detecting spinal malignancy. The available evidence indicates that in patients with LBP, an indication of spinal malignancy should not be based on the results of one single "red flag" question. Further research to evaluate the performance of different combinations of tests is recommended.
Collapse
|
73
|
Williams CM, Henschke N, Maher CG, van Tulder MW, Koes BW, Macaskill P, Irwig L. Red flags to screen for vertebral fracture in patients presenting with low-back pain. Cochrane Database Syst Rev 2013:CD008643. [PMID: 23440831 DOI: 10.1002/14651858.cd008643.pub2] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
BACKGROUND Low-back pain (LBP) is a common condition seen in primary care. A principal aim during a clinical examination is to identify patients with a higher likelihood of underlying serious pathology, such as vertebral fracture, who may require additional investigation and specific treatment. All 'evidence-based' clinical practice guidelines recommend the use of red flags to screen for serious causes of back pain. However, it remains unclear if the diagnostic accuracy of red flags is sufficient to support this recommendation. OBJECTIVES To assess the diagnostic accuracy of red flags obtained in a clinical history or physical examination to screen for vertebral fracture in patients presenting with LBP. SEARCH METHODS Electronic databases were searched for primary studies between the earliest date and 7 March 2012. Forward and backward citation searching of eligible studies was also conducted. SELECTION CRITERIA Studies were considered if they compared the results of any aspect of the history or test conducted in the physical examination of patients presenting for LBP or examination of the lumbar spine, with a reference standard (diagnostic imaging). The selection criteria were independently applied by two review authors. DATA COLLECTION AND ANALYSIS Three review authors independently conducted 'Risk of bias' assessment and data extraction. Risk of bias was assessed using the 11-item QUADAS tool. Characteristics of studies, patients, index tests and reference standards were extracted. Where available, raw data were used to calculate sensitivity and specificity with 95% confidence intervals (CI). Due to the heterogeneity of studies and tests, statistical pooling was not appropriate and the analysis for the review was descriptive only. Likelihood ratios for each test were calculated and used as an indication of clinical usefulness. MAIN RESULTS Eight studies set in primary (four), secondary (one) and tertiary care (accident and emergency = three) were included in the review. Overall, the risk of bias of studies was moderate with high risk of selection and verification bias the predominant flaws. Reporting of index and reference tests was poor. The prevalence of vertebral fracture in accident and emergency settings ranged from 6.5% to 11% and in primary care from 0.7% to 4.5%. There were 29 groups of index tests investigated however, only two featured in more than two studies. Descriptive analyses revealed that three red flags in primary care were potentially useful with meaningful positive likelihood ratios (LR+) but mostly imprecise estimates (significant trauma, older age, corticosteroid use; LR+ point estimate ranging 3.42 to 12.85, 3.69 to 9.39, 3.97 to 48.50 respectively). One red flag in tertiary care appeared informative (contusion/abrasion; LR+ 31.09, 95% CI 18.25 to 52.96). The results of combined tests appeared more informative than individual red flags with LR+ estimates generally greater in magnitude and precision. AUTHORS' CONCLUSIONS The available evidence does not support the use of many red flags to specifically screen for vertebral fracture in patients presenting for LBP. Based on evidence from single studies, few individual red flags appear informative as most have poor diagnostic accuracy as indicated by imprecise estimates of likelihood ratios. When combinations of red flags were used the performance appeared to improve. From the limited evidence, the findings give rise to a weak recommendation that a combination of a small subset of red flags may be useful to screen for vertebral fracture. It should also be noted that many red flags have high false positive rates; and if acted upon uncritically there would be consequences for the cost of management and outcomes of patients with LBP. Further research should focus on appropriate sets of red flags and adequate reporting of both index and reference tests.
Collapse
|
74
|
Hersch J, Jansen J, Barratt A, Irwig L, Houssami N, Howard K, Dhillon H, McCaffery K. Women's views on overdiagnosis in breast cancer screening: a qualitative study. BMJ 2013; 346:f158. [PMID: 23344309 PMCID: PMC3552499 DOI: 10.1136/bmj.f158] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
OBJECTIVE To elicit women's responses to information about the nature and extent of overdiagnosis in mammography screening (detecting disease that would not present clinically during the woman's lifetime) and explore how awareness of overdiagnosis might influence attitudes and intentions about screening. DESIGN Qualitative study using focus groups that included a presentation explaining overdiagnosis, incorporating different published estimates of its rate (1-10%, 30%, 50%) and information on the mortality benefit of screening, with guided group discussions SETTING Sydney, Australia PARTICIPANTS Fifty women aged 40-79 years with no personal history of breast cancer and with varying levels of education and participation in screening. RESULTS Prior awareness of breast cancer overdiagnosis was minimal. Women generally reacted with surprise, but most came to understand the issue. Responses to overdiagnosis and the different estimates of its magnitude were diverse. The highest estimate (50%) made some women perceive a need for more careful personal decision making about screening. In contrast, the lower and intermediate estimates (1-10% and 30%) had limited impact on attitudes and intentions, with many women remaining committed to screening. For some women, the information raised concerns, not about whether to screen but whether to treat a screen detected cancer or consider alternative approaches (such as watchful waiting). Information preferences varied: many women considered it important to take overdiagnosis into account and make informed choices about whether to have screening, but many wanted to be encouraged to be screened. CONCLUSIONS Women from a range of socioeconomic backgrounds could comprehend the issue of overdiagnosis in mammography screening, and they generally valued information about it. Effects on screening intentions may depend heavily on the rate of overdiagnosis. Overdiagnosis will be new and counterintuitive for many people and may influence screening and treatment decisions in unintended ways, underscoring the need for careful communication.
Collapse
|
75
|
Marinovich ML, Houssami N, Macaskill P, Sardanelli F, Irwig L, Mamounas EP, von Minckwitz G, Brennan ME, Ciatto S. Meta-analysis of magnetic resonance imaging in detecting residual breast cancer after neoadjuvant therapy. J Natl Cancer Inst 2013; 105:321-33. [PMID: 23297042 DOI: 10.1093/jnci/djs528] [Citation(s) in RCA: 251] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND It has been proposed that magnetic resonance imaging (MRI) be used to guide breast cancer surgery by differentiating residual tumor from pathologic complete response (pCR) after neoadjuvant chemotherapy. This meta-analysis examines MRI accuracy in detecting residual tumor, investigates variables potentially affecting MRI performance, and compares MRI with other tests. METHODS A systematic literature search was undertaken. Hierarchical summary receiver operating characteristic (HSROC) models were used to estimate (relative) diagnostic odds ratios ([R]DORs). Summary sensitivity (correct identification of residual tumor), specificity (correct identification of pCR), and areas under the SROC curves (AUCs) were derived. All statistical tests were two-sided. RESULTS Forty-four studies (2050 patients) were included. The overall AUC of MRI was 0.88. Accuracy was lower for "standard" pCR definitions (referent category) than "less clearly described" (RDOR = 2.41, 95% confidence interval [CI] = 1.11 to 5.23) or "near-pCR" definitions (RDOR = 2.60, 95% CI = 0.73 to 9.24; P = .03.) Corresponding AUCs were 0.83, 0.90, and 0.91. Specificity was higher when negative MRI was defined as contrast enhancement less than or equal to normal tissue (0.83, 95% CI = 0.64 to 0.93) vs no enhancement (0.54, 95% CI = 0.39 to 0.69; P = .02), with comparable sensitivity (0.83, 95% CI = 0.69 to 0.91; vs 0.87, 95% CI = 0.80 to 0.92; P = .45). MRI had higher accuracy than mammography (P = .02); there was only weak evidence that MRI had higher accuracy than clinical examination (P = .10). No difference in MRI and ultrasound accuracy was found (P = .15). CONCLUSIONS MRI accurately detects residual tumor after neoadjuvant chemotherapy. Accuracy was lower when pCR was more rigorously defined, and specificity was lower when test negativity thresholds were more stringent; these definitions require standardization. MRI is more accurate than mammography; however, studies comparing MRI and ultrasound are required.
Collapse
|