1
|
Association Between Nasopharyngeal Colonization and Clinical Outcome in Children With Acute Otitis Media. Pediatr Infect Dis J 2023; 42:e274-e277. [PMID: 37171965 PMCID: PMC10523893 DOI: 10.1097/inf.0000000000003956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
BACKGROUND Young children with acute otitis media (AOM) frequently exhibit nasopharyngeal colonization with either Streptococcus pneumoniae, Haemophilus influenzae or both pathogens. We aimed to determine if antibiotics could be spared or shortened in those without nasopharyngeal colonization with either pathogen. METHODS In 2 separate randomized clinical trials in children aged 6-23 months with stringently-diagnosed AOM, we performed bacterial cultures on nasopharyngeal specimens collected at the time of diagnosis. In the first trial, we compared the efficacy of amoxicillin/clavulanate (amox/clav) administered for 10 days vs. that of placebo, and in the second trial, we compared the efficacy of amox/clav administered for 10 days vs. 5 days. In each trial, we classified children as being colonized with both S. pneumoniae and H. influenzae, S. pneumoniae alone, H. influenzae alone, or neither pathogen, and as experiencing either clinical success or clinical failure at the end-of-therapy visit, based on previously reported a priori criteria. RESULTS We evaluated 796 children. Among children randomized to amox/clav, those colonized with either S. pneumoniae or H. influenzae or both were approximately twice as likely to experience clinical failure as children not colonized with either pathogen (odds ratio: 1.8; confidence intervals: 1.2-2.9). In contrast, among children randomized to placebo, clinical failure at the end-of-therapy visit was not associated with nasopharyngeal culture results at the time of diagnosis. CONCLUSIONS Children colonized with either S. pneumoniae or H. influenzae or both have a greater chance of treatment failure than children colonized with neither pathogen.
Collapse
|
2
|
Randomized Trial of Tocilizumab in the Treatment of Refractory Adult Polymyositis and Dermatomyositis. ACR Open Rheumatol 2022; 4:983-990. [PMID: 36128663 PMCID: PMC9661830 DOI: 10.1002/acr2.11493] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/10/2022] [Accepted: 07/06/2022] [Indexed: 11/21/2022] Open
Abstract
Objective To assess the efficacy and tolerability of tocilizumab in a multicenter, randomized, double‐blind, placebo‐controlled trial in refractory adult patients with dermatomyositis (DM) and polymyositis (PM). Methods Thirty‐six subjects with probable or definite DM/PM were enrolled in a 6‐month phase 2B clinical trial and randomized 1:1 to receive tocilizumab (8 mg/kg intravenously) or placebo every 4 weeks for 24 weeks. Eligible subjects had either a DM rash, a myositis‐associated autoantibody or an adjudicated PM diagnosis. Active disease was defined by at least three of six abnormal core set measures (CSMs), including a manual muscle testing (MMT)‐8 score of less than 136/150. If the MMT‐8 score was greater than 136, then a cutaneous score of 3 or more (10 cm visual analogue scale) was required along with three additional abnormal CSMs indicating disease activity. The primary endpoint compared the Total Improvement Score (TIS) between both arms from week 4 to 24. Secondary outcomes included time to meeting minimal TIS improvement, changes in CSMs, time to worsening, steroid‐sparing effect, proportion of subjects meeting more stringent improvement criteria, and safety outcomes. Results There was no significant difference (P = 0.86) in the TIS over 24 weeks between tocilizumab and placebo arms. The secondary endpoints of time to improvement (minimal, moderate, or major), time to worsening, CSM changes, safety outcomes, and steroid‐sparing effect were also not significantly different between arms. Conclusion Tocilizumab was safe and well tolerated but did not meet the primary or secondary efficacy outcomes in refractory DM and PM in this 24‐week phase 2B study.
Collapse
|
3
|
Stopping rules for long-term clinical trials based on two consecutive rejections of the null hypothesis. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2015.1019142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
4
|
Abstract
BACKGROUND Limiting the duration of antimicrobial treatment constitutes a potential strategy to reduce the risk of antimicrobial resistance among children with acute otitis media. METHODS We assigned 520 children, 6 to 23 months of age, with acute otitis media to receive amoxicillin-clavulanate either for a standard duration of 10 days or for a reduced duration of 5 days followed by placebo for 5 days. We measured rates of clinical response (in a systematic fashion, on the basis of signs and symptomatic response), recurrence, and nasopharyngeal colonization, and we analyzed episode outcomes using a noninferiority approach. Symptom scores ranged from 0 to 14, with higher numbers indicating more severe symptoms. RESULTS Children who were treated with amoxicillin-clavulanate for 5 days were more likely than those who were treated for 10 days to have clinical failure (77 of 229 children [34%] vs. 39 of 238 [16%]; difference, 17 percentage points [based on unrounded data]; 95% confidence interval, 9 to 25). The mean symptom scores over the period from day 6 to day 14 were 1.61 in the 5-day group and 1.34 in the 10-day group (P=0.07); the mean scores at the day-12-to-14 assessment were 1.89 versus 1.20 (P=0.001). The percentage of children whose symptom scores decreased more than 50% (indicating less severe symptoms) from baseline to the end of treatment was lower in the 5-day group than in the 10-day group (181 of 227 children [80%] vs. 211 of 233 [91%], P=0.003). We found no significant between-group differences in rates of recurrence, adverse events, or nasopharyngeal colonization with penicillin-nonsusceptible pathogens. Clinical-failure rates were greater among children who had been exposed to three or more children for 10 or more hours per week than among those with less exposure (P=0.02) and were also greater among children with infection in both ears than among those with infection in one ear (P<0.001). CONCLUSIONS Among children 6 to 23 months of age with acute otitis media, reduced-duration antimicrobial treatment resulted in less favorable outcomes than standard-duration treatment; in addition, neither the rate of adverse events nor the rate of emergence of antimicrobial resistance was lower with the shorter regimen. (Funded by the National Institute of Allergy and Infectious Diseases and the National Center for Research Resources; ClinicalTrials.gov number, NCT01511107 .).
Collapse
|
5
|
|
6
|
|
7
|
Toward an Improved Scale for Assessing Symptom Severity in Children With Acute Otitis Media. J Pediatric Infect Dis Soc 2015; 4:367-9. [PMID: 26582877 DOI: 10.1093/jpids/piu062] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Accepted: 05/30/2014] [Indexed: 11/12/2022]
Abstract
The objective of the present study was to determine whether changes in the previously developed 7-item Acute Otitis Media Severity of Symptoms scale could improve its responsiveness and its longitudinal construct validity. The items "diminished activity" and "diminished appetite" had low or borderline levels of responsiveness and longitudinal construct validity. Dropping these items seems to be potentially advantageous.
Collapse
|
8
|
Abstract
BACKGROUND We previously developed and validated the acute otitis media severity of symptom (AOM-SOS) scale for rating symptoms of AOM in young children. In this report, we sought to estimate the minimal important difference (MID) for change in AOM-SOS scores. METHODS In a group of children 6-24 months of age with AOM enrolled in a recently reported placebo-controlled clinical trial of antibiotic efficacy, we compared changes in AOM-SOS scores with parental assessments of change over a 24-hour period. Mean absolute and mean relative change in scores in children reportedly exhibiting only a small degree of improvement were considered in arriving at an estimated MID. We then compared the proportions of children in the antibiotic and placebo groups, respectively, whose AOM-SOS scores changed more than the estimated MID at various time points after enrollment. RESULTS Data were available for 277 children. Children whose parents reported only a small degree of improvement 24 hours after enrollment had a mean decrease in AOM-SOS score of 3.8, or 55%, from baseline. We found the relative decrease more telling than the absolute decrease. The proportions of children in the antibiotic and placebo groups, respectively, whose AOM-SOS scores had decreased <55% on Day 7 were 12.3 and 23.8% (P=0.02), and during Days 4-7 were 28 and 40% (P=0.046). CONCLUSIONS We estimated the MID for change in AOM-SOS scores in young children and described use of the MID as an added metric in interpreting results from a clinical trial of antibiotic efficacy.
Collapse
|
9
|
In reference toWhat is the Role of Tympanostomy Tubes in the Treatment of Recurrent Acute Otitis Media? Laryngoscope 2013; 123:E127. [DOI: 10.1002/lary.24142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Revised: 03/18/2013] [Accepted: 03/18/2013] [Indexed: 11/09/2022]
|
10
|
Abstract
PURPOSE To develop an approach of estimating subject-centered free-response receiver operating characteristic (FROC) curve for providing patient-centered inferences regarding detection-localization characteristics of a diagnostic system. METHODS The authors examine properties of the conventional, target-centered, FROC curve and demonstrate that in scenarios where the diagnostic performance correlates with the total number of targets on a subject, it may lead to inadequate inferences from the perspective of possible benefits to a patient. Following solutions to patient-centered approaches in other applications, the authors define a subject-centered FROC curve and develop its formulation as a covariate-adjusted FROC curve. The authors also conduct a numerical study illustrating the relative properties of the conventional and subject-centered approach and provide an example. RESULTS A simple-to-implement approach for estimating the subject-centered FROC curve and its overall index can be formulated as a type of stratified FROC analysis. The authors demonstrate that when diagnostic performance is associated with the number of targets, the diagnostic system with apparently superior target-centered characteristics (conventional approach) can be actually inferior from the subject-centered perspective. The authors show that under some clinically reasonable conditions the magnitude of disagreement in results could be substantial. An example from an actual observer performance study illustrates the natural setting where the developed approach would be relevant and lead to conclusions that are contradictory to those obtained from conventional analysis. CONCLUSIONS The authors developed a subject-centered FROC curve and its overall index provides tools for inferences that may be relevant from a perspective of potential benefits to a patient.
Collapse
|
11
|
Rituximab in the treatment of refractory adult and juvenile dermatomyositis and adult polymyositis: a randomized, placebo-phase trial. ACTA ACUST UNITED AC 2013; 65:314-24. [PMID: 23124935 DOI: 10.1002/art.37754] [Citation(s) in RCA: 405] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 10/11/2012] [Indexed: 12/14/2022]
Abstract
OBJECTIVE To assess the safety and efficacy of rituximab in a randomized, double-blind, placebo-phase trial in adult and pediatric myositis patients. METHODS Adults with refractory polymyositis (PM) and adults and children with refractory dermatomyositis (DM) were enrolled. Entry criteria included muscle weakness and ≥2 additional abnormal values on core set measures (CSMs) for adults. Juvenile DM patients required ≥3 abnormal CSMs, with or without muscle weakness. Patients were randomized to receive either rituximab early or rituximab late, and glucocorticoid or immunosuppressive therapy was allowed at study entry. The primary end point compared the time to achieve the International Myositis Assessment and Clinical Studies Group preliminary definition of improvement (DOI) between the 2 groups. The secondary end points were the time to achieve ≥20% improvement in muscle strength and the proportions of patients in the early and late rituximab groups achieving the DOI at week 8. RESULTS Among 200 randomized patients (76 with PM, 76 with DM, and 48 with juvenile DM), 195 showed no difference in the time to achieving the DOI between the rituximab late (n = 102) and rituximab early (n = 93) groups (P = 0.74 by log rank test), with a median time to achieving a DOI of 20.2 weeks and 20.0 weeks, respectively. The secondary end points also did not significantly differ between the 2 treatment groups. However, 161 (83%) of the randomized patients met the DOI, and individual CSMs improved in both groups throughout the 44-week trial. CONCLUSION Although there were no significant differences in the 2 treatment arms for the primary and secondary end points, 83% of adult and juvenile myositis patients with refractory disease met the DOI. The role of B cell-depleting therapies in myositis warrants further study, with consideration for a different trial design.
Collapse
|
12
|
On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat Med 2013; 32:3449-58. [PMID: 23508757 DOI: 10.1002/sim.5777] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Revised: 01/12/2013] [Accepted: 02/11/2013] [Indexed: 11/09/2022]
Abstract
Evaluation of diagnostic performance is a necessary component of new developments in many fields including medical diagnostics and decision making. The methodology for statistical analysis of diagnostic performance continues to develop, offering new analytical tools for conventional inferences and solutions for novel and increasingly more practically relevant questions. In this paper, we focus on the partial area under the Receiver Operating Characteristic (ROC) curve or pAUC. This summary index is considered to be more practically relevant than the area under the entire ROC curve (AUC), but because of several perceived limitations, it is not used as often. To improve interpretation, results for pAUC analysis are frequently reported using a rescaled index such as the standardized partial AUC proposed by McClish (1989). We derive two important properties of the relationship between the 'standardized' pAUC and the defined range of interest, which could facilitate a wider and more appropriate use of this important summary index. First, we mathematically prove that the 'standardized' pAUC increases with increasing range of interest for practically common ROC curves. Second, using comprehensive numerical investigations, we demonstrate that, contrary to common belief, the uncertainty about the estimated standardized pAUC can either decrease or increase with an increasing range of interest. Our results indicate that the partial AUC could frequently offer advantages in terms of statistical uncertainty of the estimation. In addition, selection of a wider range of interest will likely lead to an increased estimate even for standardized pAUC.
Collapse
|
13
|
Identifying children with vesicoureteral reflux: a comparison of 2 approaches. J Urol 2012; 188:1895-9. [PMID: 22998917 DOI: 10.1016/j.juro.2012.07.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Indexed: 10/27/2022]
Abstract
PURPOSE Various screening approaches have been proposed to identify the subgroup of children with urinary tract infection who have vesicoureteral reflux. However, few studies have compared the sensitivity of screening approaches in a representative population of young children. We compared the sensitivities of the top-down ((99m)technetium dimercaptosuccinic acid renal scan to screen) and biomarker based (C-reactive protein level at presentation) approaches in identifying children with vesicoureteral reflux. MATERIALS AND METHODS We calculated the sensitivity of the 2 screening approaches in detecting vesicoureteral reflux and subsequently high grade (III or greater) vesicoureteral reflux in children. RESULTS The top-down and C-reactive protein based approaches missed 33% and 29% of cases of high grade vesicoureteral reflux, respectively. CONCLUSIONS The sensitivity of the top-down approach for detecting high grade vesicoureteral reflux was lower than previously reported. Further study of novel methods to identify children at risk for renal scarring is warranted.
Collapse
|
14
|
Development of an algorithm for the diagnosis of otitis media. Acad Pediatr 2012; 12:214-8. [PMID: 22459064 DOI: 10.1016/j.acap.2012.01.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 01/19/2012] [Accepted: 01/19/2012] [Indexed: 11/29/2022]
Abstract
BACKGROUND The relative importance of signs and symptoms in the diagnosis of otitis media has not been adequately evaluated. This has led to a large degree of variation in the criteria used to diagnose otitis media, which has resulted in inconsistencies in clinical care and discrepant research findings. METHODS A group of experienced otoscopists examined children presenting for primary care. We investigated the signs and symptoms that these otoscopists used to distinguish acute otitis media (AOM), otitis media with effusion (OME), and no effusion. We used recursive partitioning to develop a diagnostic algorithm. To assess the algorithm, we validated it in an independent dataset. RESULTS Bulging of the tympanic membrane (TM) was the main finding that otoscopists used to discriminate AOM from OME; information regarding the presence or absence of other signs and symptoms added little to the diagnostic process. Overall, 92% of children with AOM had a bulging TM compared with 0% of children with OME. Opacification and/or an air-fluid level was the main finding that the otoscopists used to discriminate OME from no effusion; 97% of children diagnosed with OME had an opaque TM compared with 5% of children diagnosed with no effusion. An algorithm that used bulging and opacification of the TM correctly classified 99% of ears in an independent dataset. CONCLUSIONS Bulging of the TM was the finding that best discriminated AOM from OME. The algorithm developed here may prove to be useful in clinical care, research, and education concerning otitis media.
Collapse
|
15
|
Abstract
OBJECTIVE To describe the pain associated with diagnostic tympanocentesis and to gather preliminary data comparing the efficacy of 3 methods of pain reduction for tympanocentesis. METHODS In children 6 to 36 months of age undergoing tympanocentesis for acute otitis media, the authors measured pain and distress throughout all phases of the procedure and recovery using physiological (heart rate) and behavioral measures (cry duration, Global Mood Scale score, and pain visual analog scales). They compared--in a randomized controlled trial--3 pain reduction interventions: acetaminophen, acetaminophen plus codeine, and ibuprofen plus midazolam. RESULTS Heart rate increased throughout the procedure, peaking during needle aspiration. Children treated with acetaminophen alone had higher peak heart rates and Global Mood Scale scores during parts of the procedure. CONCLUSIONS Acetaminophen alone may not be as effective in reducing pain-related physiological and behavioral changes as acetaminophen plus codeine or ibuprofen plus midazolam during diagnostic tympanocentesis.
Collapse
|
16
|
Pneumococcal resistance and serotype 19A in Pittsburgh-area children with acute otitis media before and after introduction of 7-valent pneumococcal polysaccharide vaccine. Clin Pediatr (Phila) 2011; 50:114-20. [PMID: 21098526 DOI: 10.1177/0009922810384259] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
METHODS Before and after introduction of pneumococcal conjugate vaccine (PCV7), the authors obtained nasopharyngeal (NP) specimens from 3 groups of children aged 6 to 23 months with acute otitis media (AOM): group 1 (pre-PCV7), group 2 (early post-PCV7), and group 3 (late post-PCV7). RESULTS Of the Streptococcus pneumoniae isolates, the proportion that were vaccine serotypes (VTs) declined progressively (60.4% vs 48.6% vs 5.2% in groups 1, 2, and 3, respectively; P < .001). Concurrently, increases occurred in the proportion of penicillin-nonsusceptible isolates (minimum inhibitory concentration >0.1 µg/mL; 26.7% vs 37.8% vs. 38.5%; P = .12); the proportion of isolates that were serotype 19A (4.0% vs 0% vs 25.9%; P < .001); and the proportion of 19A isolates that were penicillin-nonsusceptible (0% in group 1, 68.6% in group 3; P = .004). CONCLUSION Shifts in pneumococcal serotype distribution and increases in penicillin nonsusceptibility among pneumococcal isolates from children with AOM underscore the need for continuing bacteriological surveillance for future vaccine development.
Collapse
|
17
|
Reply. Acad Radiol 2011. [DOI: 10.1016/j.acra.2010.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Use of likelihood ratios for comparisons of binary diagnostic tests: underlying ROC curves. Med Phys 2011; 37:5821-30. [PMID: 21158294 DOI: 10.1118/1.3503849] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
PURPOSE When comparing binary test results from two diagnostic systems, superiority in both "sensitivity" and "specificity" also implies differences in all conventional summary indices and locally in the underlying receiver operating characteristics (ROC) curves. However, when one of the two binary tests has higher sensitivity and lower specificity (or vice versa), comparisons of their performance levels are nontrivial and the use of different summary indices may lead to contradictory conclusions. A frequently used approach that is free of subjectivity associated with summary indices is based on the comparison of the underlying ROC curves that requires the collection of rating data using multicategory scales, whether natural or experimentally imposed. However, data for reliable estimation of ROC curves are frequently unavailable. The purpose of this article is to develop an approach of using "diagnostic likelihood ratios", namely, likelihood ratios of "positive" or "negative" responses, to make simple inferences regarding the underlying ROC curves and associated areas in the absence of reliable rating data or regarding the relative binary characteristics, when these are of primary interest. METHODS For inferences related to underlying curves, the authors exploit the assumption of concavity of the true underlying ROC curve to describe conditions under which these curves have to be different and under which the curves have different areas. For scenarios when the binary characteristics are of primary interest, the authors use characteristics of "chance performance" to demonstrate that the derived conditions provide strong evidence of superiority of one binary test as compared to another. By relating these derived conditions to hypotheses about the true likelihood ratios of two binary diagnostic tests being compared, the authors enable a straightforward statistical procedure for the corresponding inferences. RESULTS The authors derived simple algebraic and graphical methods for describing the conditions for superiority of one of two diagnostic tests with respect to their binary characteristics, the underlying ROC curves, or the areas under the curves. The graphical regions are useful for identifying potential differences between two systems, which then have to be tested statistically. The simple statistical tests can be performed with well known methods for comparison of diagnostic likelihood ratios. The developed approach offers a solution for some of the more difficult to analyze scenarios, where diagnostic tests do not demonstrate concordant differences in terms of both sensitivity and specificity. In addition, the resulting inferences do not contradict the conclusions that can be obtained using conventional and reasonably defined summary indices. CONCLUSIONS When binary diagnostic tests are of primary interest, the proposed approach offers an objective and powerful method for comparing two binary diagnostic tests. The significant advantage of this method is that it enables objective analyses when one test has higher sensitivity but lower specificity, while ensuring agreement with study conclusions based on other reasonable and widely acceptable summary indices. For truly multicategory diagnostic tests, the proposed method can help in concluding inferiority of one of the diagnostic tests based on binary data, thereby potentially saving the need for conducting a more expensive multicategory ROC study.
Collapse
|
19
|
Abstract
BACKGROUND Recommendations vary regarding immediate antimicrobial treatment versus watchful waiting for children younger than 2 years of age with acute otitis media. METHODS We randomly assigned 291 children 6 to 23 months of age, with acute otitis media diagnosed with the use of stringent criteria, to receive amoxicillin-clavulanate or placebo for 10 days. We measured symptomatic response and rates of clinical failure. RESULTS Among the children who received amoxicillin-clavulanate, 35% had initial resolution of symptoms by day 2, 61% by day 4, and 80% by day 7; among children who received placebo, 28% had initial resolution of symptoms by day 2, 54% by day 4, and 74% by day 7 (P=0.14 for the overall comparison). For sustained resolution of symptoms, the corresponding values were 20%, 41%, and 67% with amoxicillin-clavulanate, as compared with 14%, 36%, and 53% with placebo (P=0.04 for the overall comparison). Mean symptom scores over the first 7 days were lower for the children treated with amoxicillin-clavulanate than for those who received placebo (P=0.02). The rate of clinical failure--defined as the persistence of signs of acute infection on otoscopic examination--was also lower among the children treated with amoxicillin-clavulanate than among those who received placebo: 4% versus 23% at or before the visit on day 4 or 5 (P<0.001) and 16% versus 51% at or before the visit on day 10 to 12 (P<0.001). Mastoiditis developed in one child who received placebo. Diarrhea and diaper-area dermatitis were more common among children who received amoxicillin-clavulanate. There were no significant changes in either group in the rates of nasopharyngeal colonization with nonsusceptible Streptococcus pneumoniae. CONCLUSIONS Among children 6 to 23 months of age with acute otitis media, treatment with amoxicillin-clavulanate for 10 days tended to reduce the time to resolution of symptoms and reduced the overall symptom burden and the rate of persistent signs of acute infection on otoscopic examination. (Funded by the National Institute of Allergy and Infectious Diseases; ClinicalTrials.gov number, NCT00377260.).
Collapse
|
20
|
Is an ROC-type response truly always better than a binary response in observer performance studies? Acad Radiol 2010; 17:639-45. [PMID: 20236840 DOI: 10.1016/j.acra.2009.12.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2009] [Revised: 12/17/2009] [Accepted: 12/27/2009] [Indexed: 01/20/2023]
Abstract
RATIONALE AND OBJECTIVES The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis. MATERIALS AND METHODS Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates. RESULTS The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points. CONCLUSIONS The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages.
Collapse
|
21
|
Time to diagnosis and performance levels during repeat interpretations of digital breast tomosynthesis: preliminary observations. Acad Radiol 2010; 17:450-5. [PMID: 20036584 DOI: 10.1016/j.acra.2009.11.011] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Revised: 11/06/2009] [Accepted: 11/08/2009] [Indexed: 10/20/2022]
Abstract
RATIONALE AND OBJECTIVES To compare time to interpretation and diagnostic performance levels during repeat readings of full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) in a retrospective study. MATERIALS AND METHODS Three experienced radiologists twice interpreted 125 selected examinations, 35 with verified cancers and 90 negative for cancer during a period of 22 months using FFDM alone followed by a combined FFDM + DBT mode. Changes in time to "review and rate" these examinations as well as in diagnostic performance levels where assessed. A fixed-effect analysis accounting for cross-correlation due to the review of the same examinations by the same readers was performed. RESULTS The total (combined) time to review and rate an examination increased on average by 33% between the first and second readings of the same examinations (P < .001). Radiologists reduced their time to review FFDM before making the DBT available for viewing. However, they spent more time reviewing the combined FFDM + DBT mode. The recall rates for examinations depicting cancer remained largely unchanged. Among the groups of examinations with concordant and discordant recall recommendations during the two readings only the group examinations that were "newly recalled" during repeat reading, took significantly longer (P < .01). CONCLUSION DBT-based breast imaging may ultimately result in a substantial increase in performance; however, without efficiency improvements DBT may take longer to interpret. Addition of "false-positive recalls" was most strongly associated with increase in interpretation time while elimination of "false-positive recalls" did not require longer interpretation time.
Collapse
|
22
|
Adenoidectomy for otitis media with effusion in 2-3-year-old children. Int J Pediatr Otorhinolaryngol 2009; 73:1718-24. [PMID: 19819563 PMCID: PMC2787742 DOI: 10.1016/j.ijporl.2009.09.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2009] [Revised: 09/02/2009] [Accepted: 09/06/2009] [Indexed: 11/30/2022]
Abstract
OBJECTIVE To compare the efficacy of three surgical treatment combinations - myringotomy and tympanostomy tube insertion (M&T), adenoidectomy with M&T (A-M&T), and adenoidectomy with myringotomy (A-M) - in reducing middle-ear disease in young children with chronic OME. METHODS Children 24-47 months of age, with a history of bilateral middle-ear effusion (MEE) for at least 3 months, unilateral for 6 months or longer or unilateral for 3 months after extrusion of a tympanostomy tube, unresponsive to recent antibiotic, were randomly assigned to either M&T, A-M&T, or A-M. Treatment assignment was stratified by age (24-35 months, 36-47 months), nasal obstruction (no, yes) and previous history of M&T (no, yes). Subjects were followed monthly and with any signs or symptoms of ear disease for up to 36 months. RESULTS Ninety-eight subjects were randomly assigned to the three treatment groups. Fifty-six subjects (57%) were 24-35 months of age; 63% had nasal obstruction, and 36% had previously undergone M&T. During the 36 months after entry, subjects were noted to have MEE for the following percentages of time: 18.6% in the M&T group, 20.6% in the A-M&T group, and 31.1% in the A-M group (M&T vs. A-M&T, p=0.87; M&T vs. A-M, p=0.01). By 36 months, there were no differences in the number of further surgical procedures for ear disease needed among the groups. CONCLUSIONS Adenoidectomy with or without tube insertion provided no advantage to young children with chronic OME in regard to time with effusion compared to tube insertion alone. Fewer tympanostomy tubes were placed in children undergoing A-M as their initial procedure, but this should be balanced by the performance of the more invasive surgical procedure and their increased time with effusion.
Collapse
|
23
|
Performance assessments of diagnostic systems under the FROC paradigm: experimental, analytical, and results interpretation issues. Acad Radiol 2009; 16:770-1. [PMID: 19427984 DOI: 10.1016/j.acra.2009.03.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Revised: 03/23/2009] [Accepted: 03/23/2009] [Indexed: 10/20/2022]
|
24
|
Analyzing receiver operating characteristic curves with SAS ®
. Mithat Gönen, SAS Institute Inc., Cary, NC, 2007. No. of pages: x+134. Price: $31.95. ISBN 978-1-59994-298-8. Stat Med 2009. [DOI: 10.1002/sim.3500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
25
|
Binary and multi-category ratings in a laboratory observer performance study: a comparison. Med Phys 2008; 35:4404-9. [PMID: 18975686 DOI: 10.1118/1.2977766] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed with other enriched sets of examinations that they had individually read in the clinic, using both a screening BI-RADS rating scale (recall/not recall) and a semi-continuous ROC type rating scale (0 to 100). The vertical distance, namely the difference in sensitivity levels at the same specificity levels, between the empirical ROC curve and the binary operating point were computed for each reader. The vertical distance averaged over all readers was used to assess the proximity of the performance levels under the binary and ROC-type rating scale. There does not appear to be any systematic tendency of the readers towards a better performance when using either of the two rating approaches, namely four readers performed better using the semi-continuous rating scale, four readers performed better with the binary scale, and one reader had the point exactly on the empirical ROC curve. Only one of the nine readers had a binary "operating point" that was statistically distant from the same reader's empirical ROC curve. Reader-specific differences ranged from -0.046 to 0.128 with an average width of the corresponding 95% confidence intervals of 0.2 and p-values ranging for individual readers from 0.050 to 0.966. On average, radiologists performed similarly when using the two rating scales in that the average distance between the run in individual reader's binary operating point and their ROC curve was close to zero. The 95% confidence interval for the fixed-reader average (0.016) was (-0.0206, 0.0631) (two-sided p-value 0.35). In conclusion the authors found that in retrospective observer performance studies the use of a binary response or a semi-continuous rating scale led to consistent results in terms of performance as measured by sensitivity-specificity operating points.
Collapse
|
26
|
Agreement of the order of overall performance levels under different reading paradigms. Acad Radiol 2008; 15:1567-73. [PMID: 19000873 DOI: 10.1016/j.acra.2008.07.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 07/15/2008] [Accepted: 07/15/2008] [Indexed: 11/27/2022]
Abstract
RATIONALE AND OBJECTIVES To investigate consistency of the orders of performance levels when interpreting mammograms under three different reading paradigms. MATERIALS AND METHODS We performed a retrospective observer study in which nine experienced radiologists rated an enriched set of mammography examinations that they personally had read in the clinic ("individualized") mixed with a set that none of them had read in the clinic ("common set"). Examinations were interpreted under three different reading paradigms: binary using screening Breast Imaging Reporting and Data System (BI-RADS), receiver-operating characteristic (ROC), and free-response ROC (FROC). The performance in discriminating between cancer and noncancer findings under each of the paradigms was summarized using Youden's index/2+0.5 (Binary), nonparameteric area under the ROC curve (AUC), and an overall FROC index (JAFROC-2). Pearson correlation coefficients were then computed to assess consistency in the ordering of observers' performance levels. Statistical significance of the computed correlation coefficients was assessed using bootstrap confidence intervals obtained by resampling sets of examination-specific observations. RESULTS All but one of the computed pair-wise correlation coefficients were larger than 0.66 and were significantly different from zero. The correlation between the overall performance measures under the Binary and ROC paradigms was the lowest (0.43) and was not significantly different from zero (95% confidence interval -0.078 to 0.733). CONCLUSION The use of different evaluation paradigms in the laboratory tends to lead to consistent ordering of the overall performance levels of observers. However, one should recognize that conceptually similar performance indexes resulting from different paradigms often measure different performance characteristics and thus disagreements are not only possible but frequently quite natural.
Collapse
|
27
|
The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations. Radiology 2008; 249:47-53. [PMID: 18682584 DOI: 10.1148/radiol.2491072025] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
PURPOSE To compare radiologists' performance during interpretation of screening mammograms in the clinic with their performance when reading the same mammograms in a retrospective laboratory study. MATERIALS AND METHODS This study was conducted under an institutional review board-approved, HIPAA-compliant protocol; the need for informed consent was waived. Nine experienced radiologists rated an enriched set of mammograms that they had personally read in the clinic (the "reader-specific" set) mixed with an enriched "common" set of mammograms that none of the participants had previously read in the clinic by using a screening Breast Imaging Reporting and Data System (BI-RADS) rating scale. The original clinical recommendations to recall the women for a diagnostic work-up, for both reader-specific and common sets, were compared with their recommendations during the retrospective experiment. The results are presented in terms of reader-specific and group-averaged sensitivity and specificity levels and the dispersion (spread) of reader-specific performance estimates. RESULTS On average, the radiologists' performance was significantly better in the clinic than in the laboratory (P = .035). Interreader dispersion of the computed performance levels was significantly lower during the clinical interpretations (P < .01). CONCLUSION Retrospective laboratory experiments may not represent either expected performance levels or interreader variability during clinical interpretations of the same set of mammograms in the clinical environment well.
Collapse
|
28
|
On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 2008; 35:1547-58. [PMID: 18491549 DOI: 10.1118/1.2890410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The task of searching and detecting multiple abnormalities depicted on an image, or a series of images, is a common problem in different areas such as military target detection or diagnostic medical imaging. A free response receiver operating characteristic (FROC) approach for assessing performance in many of these scenarios entails marking the locations of suspected abnormalities and indicating a level of suspicion at each of the marked locations. One of the important characteristics of a system being evaluated under the FROC paradigm is its performance in the conventional ROC domain, namely classifying a subject (or a unit of interest) as "negative" or "positive" in regard to the presence of the abnormality (or any of the abnormalities) of interest. With FROC data we can compare subjects by specifying a function of multiple scores within a subject. This approach allows formulating subject-based ROC type indices that can be estimated using existing ROC concepts. In this article we focus on indices that reflect the ability of the system to discriminate between actually negative and actually positive subjects. We consider a previously proposed index that is based on the comparison of the highest scores on subjects and two new indices that are based on potentially more stable comparison functions, namely comparison of average scores and stochastic dominance. Based on these indices we develop nonparametric procedures for comparing subject-based discriminative ability of diagnostic systems being evaluated under the FROC paradigm. We also investigate the properties of the statistical procedures in a simulation study.
Collapse
|
29
|
Age of child, more than HPV type, is associated with clinical course in recurrent respiratory papillomatosis. PLoS One 2008; 3:e2263. [PMID: 18509465 PMCID: PMC2386234 DOI: 10.1371/journal.pone.0002263] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2008] [Accepted: 04/02/2008] [Indexed: 11/27/2022] Open
Abstract
Background RRP is a devastating disease in which papillomas in the airway cause hoarseness and breathing difficulty. The disease is caused by human papillomavirus (HPV) 6 or 11 and is very variable. Patients undergo multiple surgeries to maintain a patent airway and in order to communicate vocally. Several small studies have been published in which most have noted that HPV 11 is associated with a more aggressive course. Methodology/Principal Findings Papilloma biopsies were taken from patients undergoing surgical treatment of RRP and were subjected to HPV typing. 118 patients with juvenile-onset RRP with at least 1 year of clinical data and infected with a single HPV type were analyzed. HPV 11 was encountered in 40% of the patients. By our definition, most of the patients in the sample (81%) had run an aggressive course. The odds of a patient with HPV 11 running an aggressive course were 3.9 times higher than that of patients with HPV 6 (Fisher's exact p = 0.017). However, clinical course was more closely associated with age of the patient (at diagnosis and at the time of the current surgery) than with HPV type. Patients with HPV 11 were diagnosed at a younger age (2.4y) than were those with HPV 6 (3.4y) (p = 0.014). Both by multiple linear regression and by multiple logistic regression HPV type was only weakly associated with metrics of disease course when simultaneously accounting for age. Conclusions/Significance Abstract The course of RRP is variable and a quarter of the variability can be accounted for by the age of the patient. HPV 11 is more closely associated with a younger age at diagnosis than it is associated with an aggressive clinical course. These data suggest that there are factors other than HPV type and age of the patient that determine disease course.
Collapse
|
30
|
Abstract
Free-response assessment of diagnostic systems continues to gain acceptance in areas related to the detection, localization, and classification of one or more "abnormalities" within a subject. A free-response receiver operating characteristic (FROC) curve is a tool for characterizing the performance of a free-response system at all decision thresholds simultaneously. Although the importance of a single index summarizing the entire curve over all decision thresholds is well recognized in ROC analysis (e.g., area under the ROC curve), currently there is no widely accepted summary of a system being evaluated under the FROC paradigm. In this article, we propose a new index of the free-response performance at all decision thresholds simultaneously, and develop a nonparametric method for its analysis. Algebraically, the proposed summary index is the area under the empirical FROC curve penalized for the number of erroneous marks, rewarded for the fraction of detected abnormalities, and adjusted for the effect of the target size (or "acceptance radius"). Geometrically, the proposed index can be interpreted as a measure of average performance superiority over an artificial "guessing" free-response process and it represents an analogy to the area between the ROC curve and the "guessing" or diagonal line. We derive the ideal bootstrap estimator of the variance, which can be used for a resampling-free construction of asymptotic bootstrap confidence intervals and for sample size estimation using standard expressions. The proposed procedure is free from any parametric assumptions and does not require an assumption of independence of observations within a subject. We provide an example with a dataset sampled from a diagnostic imaging study and conduct simulations that demonstrate the appropriateness of the developed procedure for the considered sample sizes and ranges of parameters.
Collapse
|
31
|
Comparing areas under receiver operating characteristic curves: potential impact of the "Last" experimentally measured operating point. Radiology 2008; 247:12-5. [PMID: 18258813 DOI: 10.1148/radiol.2471071321] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A specific issue related to the selection of the analytic tool used when comparing the estimated performance of systems within the receiver operating characteristic (ROC) paradigm is reviewed. This issue is related to the possible effect of the last experimentally ascertained ROC data point in terms of highest true-positive and false-positive fractions. An example of a case is presented where the selection of a specific analysis approach could affect the study conclusion from being not statistically significant for parametric analysis and significant for nonparametric analysis. This is followed by recommendations that should help prevent misinterpretation of results.
Collapse
|
32
|
Spectral gradient acoustic reflectometry compared with tympanometry in diagnosing middle ear effusion in children aged 6 to 24 months. ACTA ACUST UNITED AC 2007; 161:884-8. [PMID: 17768289 DOI: 10.1001/archpedi.161.9.884] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
OBJECTIVES To evaluate the accuracy of spectral gradient acoustic reflectometry (SGAR) in children aged 6 to 24 months, and to compare SGAR with tympanometry. DESIGN Comparison of diagnostic tests. SETTING Inner-city primary care center in Pittsburgh, Pennsylvania. PARTICIPANTS A total of 786 healthy children aged 6 to 24 months. MAIN OUTCOME MEASURES Test characteristics of SGAR (sensitivity, specificity, and positive and negative predictive values) and receiver operating characteristic curves from the SGAR and tympanometric data. RESULTS The SGAR results were available for 3096 otoscopic examinations in 647 children. Tympanometric results were available for 2854 otoscopic examinations in 597 children. Using the recommended SGAR pass or fail cutoff, 53% of the ears in which effusion was present would have been considered effusion free (sensitivity, 47%). Only 10% of the ears without effusion would have been considered to have effusion (specificity, 90%). The area under the receiver operating characteristic curve was 0.78 for SGAR and 0.83 for tympanometry. CONCLUSION Spectral gradient acoustic reflectometry is slightly less discerning than tympanometry in predicting the presence or absence of middle ear effusion in children younger than 2 years.
Collapse
|
33
|
|
34
|
Abstract
We have observed that a very large fraction of responses for several detection tasks during the performance of observer studies are in the extreme ranges of lower than 11% or higher than 89% regardless of the actual presence or absence of the abnormality in question or its subjectively rated "subtleness." This observation raises questions regarding the validity and appropriateness of using multicategory rating scales for such detection tasks. Monte Carlo simulation of binary and multicategory ratings for these tasks demonstrate that the use of the former (binary) often results in a less biased and more precise summary index and hence may lead to a higher statistical power for determining differences between modalities.
Collapse
|
35
|
Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests an anova approach with dependent observations. COMMUN STAT-SIMUL C 2007. [DOI: 10.1080/03610919508813243] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
36
|
|
37
|
Abstract
BACKGROUND Developmental impairments in children have been attributed to persistent middle-ear effusion in their early years of life. Previously, we reported that among children younger than 3 years of age with persistent middle-ear effusion, prompt as compared with delayed insertion of tympanostomy tubes did not result in improved cognitive, language, speech, or psychosocial development at 3, 4, or 6 years of age. However, other important components of development could not be assessed until the children were older. METHODS We enrolled 6350 infants soon after birth and evaluated them regularly for middle-ear effusion. Before 3 years of age, 429 children with persistent effusion were randomly assigned to undergo the insertion of tympanostomy tubes either promptly or up to 9 months later if effusion persisted. We assessed literacy, attention, social skills, and academic achievement in 391 of these children at 9 to 11 years of age. RESULTS Mean (+/-SD) scores on 48 developmental measures in the group of children who were assigned to undergo early insertion of tympanostomy tubes did not differ significantly from the scores in the group that was assigned to undergo delayed insertion. These measures included the Passage Comprehension subtest of the Woodcock Reading Mastery Tests (mean score, 98+/-12 in the early-treatment group and 99+/-12 in the delayed-treatment group); the Spelling, Writing Samples, and Calculation subtests of the Woodcock-Johnson III Tests of Achievement (96+/-13 and 97+/-16; 104+/-14 and 105+/-15; and 99+/-13 and 99+/-13, respectively); and inattention ratings on visual and auditory continuous performance tests. CONCLUSIONS In otherwise healthy young children who have persistent middle-ear effusion, as defined in our study, prompt insertion of tympanostomy tubes does not improve developmental outcomes up to 9 to 11 years of age. (ClinicalTrials.gov number, NCT00365092 [ClinicalTrials.gov].).
Collapse
|
38
|
The prevalence effect in a laboratory environment: Changing the confidence ratings. Acad Radiol 2007; 14:49-53. [PMID: 17178365 PMCID: PMC1769293 DOI: 10.1016/j.acra.2006.10.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2006] [Revised: 10/03/2006] [Accepted: 10/03/2006] [Indexed: 02/06/2023]
Abstract
RATIONALE AND OBJECTIVES We sought to assess whether or not prevalence levels affected the confidence ratings of readers during the interpretation of cases in a laboratory receiver operating characteristic-type observer performance study. MATERIALS AND METHODS We reanalyzed a previously conducted observer performance study that included 14 readers and 5 different levels of prevalence. The previous study yielded the observation that in the laboratory we could not detect a "prevalence effect" in terms of differences in areas under the receiver operating characteristic curves. The detection ratings (for presence or absence) of lung nodules, interstitial disease, and pneumothorax for the five prevalence levels were compared, and a test for trend in averaged ratings as a function of abnormality prevalence was performed within a mixed-model setting that accounts for different sources of variability and correlations induced by the study design. RESULTS The ratings of the cases in terms of confidence that the specific abnormality in question is present tend, on average, to be larger when actual disease prevalence is lower. The rate of the increase of the average confidence ratings with the decreasing prevalence of a specific abnormality is very similar for actually positive and actually negative cases for every considered abnormality. The observed trend in the changes of the average confidence ratings as a function of prevalence levels was statistically significant (p < 0.01). CONCLUSION Expectations of disease prevalence in the case mix during a laboratory observer performance study may systematically affect the behavior of observers in terms of their actual confidence ratings.
Collapse
|
39
|
Reader variance in ROC studies--generalizability to reader population at high and low performance levels. Acad Radiol 2006; 13:1004-10. [PMID: 16843853 DOI: 10.1016/j.acra.2006.05.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Revised: 05/26/2006] [Accepted: 05/26/2006] [Indexed: 11/30/2022]
Abstract
RATIONALE AND OBJECTIVES To investigate the variability between discriminative performances of readers as a function of average performance levels during receiver operating characteristic (ROC) studies. MATERIALS AND METHODS Four subsets of cases from previously ascertained ROC rating data by 12 observers when detecting interstitial disease and pneumothorax on posteroanterior chest films were selected for each abnormality and reanalyzed to assess changes in "reader" variance component. The subsets were selected based on a prestudy subjective assessment of the subtleness of depicted abnormality (positive cases) and the difficulty in determining its absence (negative cases). Reader variance component was estimated using a bootstrap approach for each subset and the results were used to assess a general relationship between variability and average performance level. RESULTS The reader variance component decreased substantially (from 0.007704 to 0.000426), as expected, when the areas under the ROC curves (AUC) for detecting pneumothoraces increased from 84% to 97%. On the other hand, reader variance component increased substantially (from 0.000890 to 0.005181) when AUC for detecting interstitial disease increased from 59% to 87%. The large magnitude of and changes in the reader variance component resulted in a consistent nonmonotone relationship as a function of AUC when other related variance components were included in addition to the reader component. CONCLUSION Among several factors affecting generalizability of ROC results to the population of readers, the reader variance component depended nonmonotonically on the average diagnostic performance and is lowest at both very high and very low levels of performance.
Collapse
|
40
|
Abstract
OBJECTIVE We examined relationships between tympanometric findings and the presence or absence of middle-ear effusion in a population-based sample of children under the age of 3 years. METHODS In a study of children's development in relation to early-life otitis media, we enrolled 6350 infants soon after birth and evaluated them regularly for the presence of middle-ear effusion. In 3686 of the children, we compared tympanometric findings with otoscopic diagnoses. We categorized tympanograms according to varying combinations of tympanometric peak height, peak pressure, and width, and calculated for each resulting category the percentage of the associated ears diagnosed as having effusion. Using these findings we developed algorithms for estimating the probability of middle-ear effusion associated with tympanograms of any configuration. RESULTS For tympanograms generally, the lower their height and the greater their width, the greater was the probability of associated middle-ear effusion; the probability also was greater when peak pressure was negative rather than positive. Among children > or = 6 months of age, effusion was diagnosed in only 2.7% of ears with tympanometric height > or = 0.6 mL, but in 80.2% of ears with flat tympanograms. Relationships among younger infants were similar but less consistent. In both age groups, the tympanographic configurations most commonly encountered were associated with either a relatively low probability (<30%) or a relatively high probability (>70%) of the presence of middle-ear effusion. The receiver operating characteristic curve we generated using the algorithm we developed for children > or = 6 months of age gave an area under the curve of 0.84. The algorithm performed equally well when applied to a separate group of children, suggesting that it is generalizable to other unselected populations. CONCLUSIONS The present report offers two alternative methods for estimating the probability of middle-ear effusion in children aged 6 through 35 months, given any combination of tympanometric values.
Collapse
|
41
|
|
42
|
A permutation test for comparing ROC curves in multireader studies a multi-reader ROC, permutation test. Acad Radiol 2006; 13:414-20. [PMID: 16554220 DOI: 10.1016/j.acra.2005.12.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Revised: 12/09/2005] [Accepted: 12/10/2005] [Indexed: 11/26/2022]
Abstract
RATIONALE AND OBJECTIVES The aim of the study is to develop a permutation test to compare receiver operating characteristic (ROC) curves of two diagnostic modalities in a multireader paired design. MATERIALS AND METHODS A statistical test for comparing two diagnostic modalities is developed based on all possible exchanges of the set of reader-ratings between the two modalities. An exact permutation test is formed by determining the frequency of the most extreme values of the statistic estimating the average difference in the areas under the ROC curves (AUCs). An asymptotic version of the test is constructed by obtaining the exact permutation variance and appealing to the asymptotic normality of the nonparametric estimator of the average difference in areas. Computer simulations were conducted to validate the type I error for small sample sizes. RESULTS The new test provides a permutation approach for comparing ROC curves in a multireader paired-design setting in which effects of the readers are considered to be fixed. The type I error of the asymptotic test is close to the true value, even for samples as small as 20 normal and 20 abnormal cases. The test is designed to be sensitive to alternatives in which the AUCs of the two diagnostic modalities differ. CONCLUSIONS The proposed test provides a powerful method for comparing two diagnostic modalities in a multireader paired-study design when the primary interest is to detect difference in average AUCs.
Collapse
|
43
|
The effect of image display size on observer performance an assessment of variance components. Acad Radiol 2006; 13:409-13. [PMID: 16554219 DOI: 10.1016/j.acra.2005.11.033] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 11/10/2005] [Accepted: 11/11/2005] [Indexed: 11/16/2022]
Abstract
RATIONALE AND OBJECTIVE Our goal was to investigate the effect of the displayed image size on variance components during the performance of an observer performance study to detect masses on abdominal computed tomography (CT) examinations. MATERIALS AND METHODS A previously performed receiver operating characteristic (ROC) study with eight observers to detect abdominal masses on 166 CT examinations was reanalyzed to assess variance components when comparing two similar modes with displayed image sizes varying by a factor of 2. Case, mode, and reader-related variance components were estimated for the group of eight observers and subsets of readers after excluding each of the participants. RESULTS There was no significant difference in the average area under the ROC curves between the two modes using the two image sizes (P > .05). Reader and reader-by-case variability were substantially larger for the mode displaying enlarged images for the group and all subsets formed by excluding a single reader. Reader variability was affected by one observer who actually performed better with the enlarged images. CONCLUSION Sequential viewing of enlarged CT images for the detection of abdominal masses did not improve performance and increased reader variability.
Collapse
|
44
|
Prospective study of electrical impedance scanning for identifying young women at risk for breast cancer. Breast Cancer Res Treat 2006; 97:179-89. [PMID: 16491309 DOI: 10.1007/s10549-005-9109-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2005] [Accepted: 10/26/2005] [Indexed: 10/25/2022]
Abstract
BACKGROUND One way to improve the cost-benefit ratio for breast cancer screening in younger women is to identify those at high-risk of breast cancer and manage them in an optimal manner. The purpose of this study is to evaluate the sensitivity and specificity of Electrical Impedance Scanning (EIS) for identifying young women who are at risk for having breast cancer and should be followed with directed imaging technologies. METHODS A prospective, observational, two-arm, multi-site clinical trial was performed on women aged 30-45 years. The 'Sensitivity Arm' included Clinical Breast Examinations (CBE) and EIS (T-Scan 2000ED) on 189 women prior to scheduled breast biopsy. The 'Specificity Arm' included 1361 asymptomatic women visiting clinics for routine annual well-woman examination. Sensitivity and specificity were determined. Relative probability for a woman with a positive EIS examination was computed and compared with other approaches commonly used to define 'high-risk' in this population. RESULTS Fifty of 189 women in the Sensitivity arm had verified cancers, 19 of whom had positive EIS examination resulting in sensitivity of 38% (19/50). Of the 1361 women in the Specificity arm, 67 had positive EIS examination resulting in a specificity of 95% (1294/1361). The relative probability of a woman with a positive EIS examination was 7.68, which compares favorably with other established risk identifiers (e.g. two first-degree relatives with breast cancer or atypical ductal hyperplasia). CONCLUSION EIS may have an important role as a screening tool for identifying young women that should be followed more closely with advanced imaging technologies for early detection of breast cancer.
Collapse
|
45
|
A permutation test sensitive to differences in areas for comparing ROC curves from a paired design. Stat Med 2005; 24:2873-93. [PMID: 16134144 DOI: 10.1002/sim.2149] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The area under the receiver operating characteristic (ROC) curve (AUC) is a widely accepted summary index of the overall performance of diagnostic procedures and the difference between AUCs is often used when comparing two diagnostic systems. We developed an exact non-parametric statistical procedure for comparing two ROC curves in paired design settings. The test which is based on all permutations of the subject specific rank ratings is formally a test for equality of ROC curves that is sensitive to the alternatives of AUC difference. The operating characteristics of the proposed test were evaluated using extensive simulations over a wide range of parameters. The proposed procedure can be easily implemented in experimental ROC data sets. For small samples and for underlying parameters that are common in experimental studies in diagnostic imaging the test possesses good operating characteristics and is more powerful than the conventional non-parametric procedure for AUC comparisons. We also derived an asymptotic version of the test which uses an exact estimate of the variance in the permutation space and provides a good approximation even when the sample sizes are small. This asymptotic procedure is a simple and precise approximation to the exact test and is useful for large sample sizes where the exact test may be computationally burdensome.
Collapse
|
46
|
Variability in observer performance studies experimental observations. Acad Radiol 2005; 12:1527-33. [PMID: 16321741 DOI: 10.1016/j.acra.2005.08.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2005] [Revised: 08/11/2005] [Accepted: 08/11/2005] [Indexed: 10/25/2022]
Abstract
RATIONALE AND OBJECTIVES The aim of the study is to assess variance components in observer performance studies and the possible impact on study results and conclusions. MATERIALS AND METHODS Two previously performed retrospective receiver operating characteristic-type observer performance studies to evaluate the performance of seven radiologists in detecting interstitial disease on conventional posteroanterior chest films and nine radiologists in detecting interstitial disease on a high-resolution workstation were reanalyzed by using the Beiden, Wagner, and Campbell nine-component model to estimate the different variance components. We estimated case-, reader-, and mode-related components of the variance for the group as a whole and after excluding (round robin) each reader. Overall variance was evaluated, and the effect of individual readers on overall study conclusions was assessed. RESULTS Overall results and conclusions of the reanalysis agreed with the original one in that, as a group, radiologists performed significantly better when using conventional films (P < .05) in both studies. Reader variability was large compared with all other components, and in one study, it was substantially larger for the workstation reading mode. Reader variability was affected substantially by one observer in each study, and in one study, reader-by-mode variability was affected by another reader who performed better on the workstation. CONCLUSION Estimates of variance components can shed light on the appropriateness of study design, as well as the sensitivity of results to the inclusion (or exclusion) of individual observers.
Collapse
|
47
|
Abstract
The MacArthur-Bates Communicative Development Inventories (CDI; Dale, 1996; Fenson et al., 1994), parent reports about language skills, are being used increasingly in studies of theoretical and public health importance. This study (N = 113) correlated scores on the CDI at ages 2 and 3 years with scores at age 3 years on tests of cognition and receptive language and measures from parent-child conversation. Associations indicated reasonable concurrent and predictive validity. The findings suggest that satisfactory vocabulary scores at age 2 are likely to predict normal language skills at age 3, although some children with limited skills at age 3 will have had satisfactory scores at age 2. Many children with poor vocabulary scores at 2 will have normal skills at 3.
Collapse
|
48
|
Incorporating utility-weights when comparing two diagnostic systems: a preliminary assessment. Acad Radiol 2005; 12:1293-300. [PMID: 16179206 DOI: 10.1016/j.acra.2005.05.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2005] [Revised: 05/26/2005] [Accepted: 05/26/2005] [Indexed: 11/22/2022]
Abstract
RATIONALE AND OBJECTIVES We sought to develop a new index that incorporates utility-weights when assessing the overall performance of a diagnostic system and to provide a statistical test for comparing two indices in a paired study design. MATERIALS AND METHODS The area under the receiver operating characteristic (ROC) curve (AUC) was used as the basis for constructing a new index. The index we propose represents a weighted average of class-specific AUCs each of which relates to a class of pairs of actually negative (normal) and actually positive (abnormal) cases with a specific predetermined utility (or clinical importance). For each pair of normal-abnormal cases, the utility is defined a priori and based on external (covariate) information. In the proposed approach utility-weights represent the relative importance (utility) of discriminating between different types of normal and abnormal cases (pairs of the same type are combined in the classes termed utility-classes). We also describe a simple nonparametric procedure for comparing the proposed indices as computed from paired data. Computer simulations were conducted to evaluate the behavior of the type I error of the proposed test in the simple albeit important instance of two utility-classes. RESULTS The new index provides an extension of the commonly used area under the ROC curve. It allows for incorporation of utility-weights into the analysis and reduces to the conventional AUC index when all assigned utility-weights are equal to unity. Computer simulations indicate that in the considered scenario of two utility-classes, the type I error of the proposed test is comparable to that of the conventional nonparametric test for equality of AUC indices. CONCLUSIONS The proposed index and the statistical test provide a practical approach of incorporating utilities when comparing diagnostic systems.
Collapse
|
49
|
|
50
|
Abstract
BACKGROUND To prevent later developmental impairments, myringotomy with the insertion of tympanostomy tubes has often been undertaken in young children who have persistent otitis media with effusion. We previously reported that prompt as compared with delayed insertion of tympanostomy tubes in children with persistent effusion who were younger than three years of age did not result in improved developmental outcomes at three or four years of age. However, the effect on the outcomes of school-age children is unknown. METHODS We enrolled 6350 healthy infants younger than 62 days of age and evaluated them regularly for middle-ear effusion. Before three years of age, 429 children with persistent middle-ear effusion were randomly assigned to have tympanostomy tubes inserted either promptly or up to nine months later if effusion persisted. We assessed developmental outcomes in 395 of these children at six years of age. RESULTS At six years of age, 85 percent of children in the early-treatment group and 41 percent in the delayed-treatment group had received tympanostomy tubes. There were no significant differences in mean (+/-SD) scores favoring early versus delayed treatment on any of 30 measures, including the Wechsler Full-Scale Intelligence Quotient (98+/-13 vs. 98+/-14); Number of Different Words test, a measure of word diversity (183+/-36 vs. 175+/-36); Percentage of Consonants Correct-Revised test, a measure of speech-sound production (96+/-2 vs. 96+/-3); the SCAN test, a measure of central auditory processing (95+/-15 vs. 96+/-14); and several measures of behavior and emotion. CONCLUSIONS In otherwise healthy children younger than three years of age who have persistent middle-ear effusion within the duration of effusion that we studied, prompt insertion of tympanostomy tubes does not improve developmental outcomes at six years of age.
Collapse
|