1
|
Chaibub Neto E, Yadav V, Sieberts SK, Omberg L. A novel estimator for the two-way partial AUC. BMC Med Inform Decis Mak 2024; 24:57. [PMID: 38378636 PMCID: PMC10877829 DOI: 10.1186/s12911-023-02382-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 11/27/2023] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND The two-way partial AUC has been recently proposed as a way to directly quantify partial area under the ROC curve with simultaneous restrictions on the sensitivity and specificity ranges of diagnostic tests or classifiers. The metric, as originally implemented in the tpAUC R package, is estimated using a nonparametric estimator based on a trimmed Mann-Whitney U-statistic, which becomes computationally expensive in large sample sizes. (Its computational complexity is of order [Formula: see text], where [Formula: see text] and [Formula: see text] represent the number of positive and negative cases, respectively). This is problematic since the statistical methodology for comparing estimates generated from alternative diagnostic tests/classifiers relies on bootstrapping resampling and requires repeated computations of the estimator on a large number of bootstrap samples. METHODS By leveraging the graphical and probabilistic representations of the AUC, partial AUCs, and two-way partial AUC, we derive a novel estimator for the two-way partial AUC, which can be directly computed from the output of any software able to compute AUC and partial AUCs. We implemented our estimator using the computationally efficient pROC R package, which leverages a nonparametric approach using the trapezoidal rule for the computation of AUC and partial AUC scores. (Its computational complexity is of order [Formula: see text], where [Formula: see text].). We compare the empirical bias and computation time of the proposed estimator against the original estimator provided in the tpAUC package in a series of simulation studies and on two real datasets. RESULTS Our estimator tended to be less biased than the original estimator based on the trimmed Mann-Whitney U-statistic across all experiments (and showed considerably less bias in the experiments based on small sample sizes). But, most importantly, because the computational complexity of the proposed estimator is of order [Formula: see text], rather than [Formula: see text], it is much faster to compute when sample sizes are large. CONCLUSIONS The proposed estimator provides an improvement for the computation of two-way partial AUC, and allows the comparison of diagnostic tests/machine learning classifiers in large datasets where repeated computations of the original estimator on bootstrap samples become too expensive to compute.
Collapse
Affiliation(s)
| | - Vijay Yadav
- Sage Bionetworks, 2901 Third Avenue, 98121, Seattle, USA
| | | | - Larsson Omberg
- Sage Bionetworks, 2901 Third Avenue, 98121, Seattle, USA
| |
Collapse
|
2
|
Garg K, Campolonghi S. A Step-by-Step Guide to Selecting an Optimal Cut-Off Value Based on the Receiver Operating Characteristic and Youden Index in Methods Designed to Diagnose Lyme Disease. Methods Mol Biol 2024; 2742:69-76. [PMID: 38165615 DOI: 10.1007/978-1-0716-3561-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Detection tools designed to diagnose complex diseases such as Lyme Borreliosis require an optimal cutoff point to distinguish the healthy from the diseased. The chapter will provide a practical guide to selecting an optimal cutoff mark by creating the receiver operating characteristic (ROC) in Microsoft Excel. To guide the creation of a ROC graphical plot, we will use example data from an enzyme-linked immunosorbent assay (ELISA) measuring anti-human immunoglobulin G (IgG) against whole-cell Borrelia lysates. Herein, the ROC method will demonstrate that an optical density (OD) value from ELISA with the highest Youden Index (J) is an optimal cutoff value to differentiate positive and negative IgG immune responses in human serum samples.
Collapse
|
3
|
Touraine C, Winter A, Castan F, Azria D, Gourgou S. Time-Dependent ROC Curve Analysis for Assessing the Capability of Radiation-Induced CD8 T-Lymphocyte Apoptosis to Predict Late Toxicities after Adjuvant Radiotherapy of Breast Cancer Patients. Cancers (Basel) 2023; 15:4676. [PMID: 37835370 PMCID: PMC10571898 DOI: 10.3390/cancers15194676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/15/2023] [Accepted: 09/19/2023] [Indexed: 10/15/2023] Open
Abstract
Late fibrosis can occur in breast cancer patients treated with curative-intent radiotherapy. Predicting this toxicity is of clinical interest in order to adapt the irradiation dose delivered. Radiation-induced CD8 T-lymphocyte apoptosis (RILA) had been proven to be associated with less grade ≥2 late radiation-induced toxicities in patients with miscellaneous cancers. Tobacco smoking status and adjuvant hormonotherapy were also identified as potential factors related to late-breast-fibrosis-free survival. This article evaluates the predictive performance of the RILA using a ROC curve analysis that takes into account the dynamic nature of fibrosis occurrence. This time-dependent ROC curve approach is also applied to evaluate the ability of the RILA combined with the other previously identified factors. Our analysis includes a Monte Carlo cross-validation procedure and the calculation of an expected cost of misclassification, which provides more importance to patients who have no risk of late fibrosis in order to be able to treat them with the maximal irradiation dose. Performance evaluation was assessed at 12, 24, 36 and 50 months. At 36 months, our results were comparable to those obtained in a previous study, thus underlying the predictive power of the RILA. Based on specificity and cost, RILA alone seemed to be the most performant, while its association with the other factors had better negative predictive value results.
Collapse
Affiliation(s)
- Célia Touraine
- Biometrics Unit, Cancer Institute of Montpellier (ICM), University Montpellier, 34090 Montpellier, France; (C.T.); (F.C.); (S.G.)
- French National Platform Quality of Life and Cancer, 34090 Montpellier, France
- Desbrest Institute of Epidemiology and Public Health (IDESP), University Montpellier, INSERM, 34090 Montpellier, France
| | - Audrey Winter
- Biometrics Unit, Cancer Institute of Montpellier (ICM), University Montpellier, 34090 Montpellier, France; (C.T.); (F.C.); (S.G.)
- French National Platform Quality of Life and Cancer, 34090 Montpellier, France
| | - Florence Castan
- Biometrics Unit, Cancer Institute of Montpellier (ICM), University Montpellier, 34090 Montpellier, France; (C.T.); (F.C.); (S.G.)
| | - David Azria
- Radiotherapy Unit, Cancer Institute of Montpellier (ICM), University Montpellier, 34090 Montpellier, France;
| | - Sophie Gourgou
- Biometrics Unit, Cancer Institute of Montpellier (ICM), University Montpellier, 34090 Montpellier, France; (C.T.); (F.C.); (S.G.)
- French National Platform Quality of Life and Cancer, 34090 Montpellier, France
| |
Collapse
|
4
|
Estévez-Pérez G, Vieu P. A new way for ranking functional data with applications in diagnostic test. Comput Stat 2020. [DOI: 10.1007/s00180-020-01020-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
5
|
Yu H, Li F, Wu T, Li R, Yao L, Wang C, Wu X. Functional brain abnormalities in major depressive disorder using the Hilbert-Huang transform. Brain Imaging Behav 2018; 12:1556-1568. [PMID: 29427063 DOI: 10.1007/s11682-017-9816-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Major depressive disorder is a common disease worldwide, which is characterized by significant and persistent depression. Non-invasive accessory diagnosis of depression can be performed by resting-state functional magnetic resonance imaging (rs-fMRI). However, the fMRI signal may not satisfy linearity and stationarity. The Hilbert-Huang transform (HHT) is an adaptive time-frequency localization analysis method suitable for nonlinear and non-stationary signals. The objective of this study was to apply the HHT to rs-fMRI to find the abnormal brain areas of patients with depression. A total of 35 patients with depression and 37 healthy controls were subjected to rs-fMRI. The HHT was performed to extract the Hilbert-weighted mean frequency of the rs-fMRI signals, and multivariate receiver operating characteristic analysis was applied to find the abnormal brain regions with high sensitivity and specificity. We observed differences in Hilbert-weighted mean frequency between the patients and healthy controls mainly in the right hippocampus, right parahippocampal gyrus, left amygdala, and left and right caudate nucleus. Subsequently, the above-mentioned regions were included in the results obtained from the compared region homogeneity and the fractional amplitude of low frequency fluctuation method. We found brain regions with differences in the Hilbert-weighted mean frequency, and examined their sensitivity and specificity, which suggested a potential neuroimaging biomarker to distinguish between patients with depression and healthy controls. We further clarified the pathophysiological abnormality of these regions for the population with major depressive disorder.
Collapse
Affiliation(s)
- Haibin Yu
- College of Information Science and Technology, Beijing Normal University, No. 19 Xin Jie Kou Wai Da Jie, Beijing, 100875, China
| | - Feng Li
- Beijing Key Laboratory for Mental Disorders, Center of Schizophrenia, Beijing Institute for Brain Disorders, Beijing Anding Hospital of Capital Medical University, Beijing, 10088, China
| | - Tong Wu
- College of Information Science and Technology, Beijing Normal University, No. 19 Xin Jie Kou Wai Da Jie, Beijing, 100875, China
| | - Rui Li
- CAS Key Laboratory of Mental Health, Institute of Psychology, Beijing, 100101, China
| | - Li Yao
- College of Information Science and Technology, Beijing Normal University, No. 19 Xin Jie Kou Wai Da Jie, Beijing, 100875, China.,State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Chuanyue Wang
- Beijing Key Laboratory for Mental Disorders, Center of Schizophrenia, Beijing Institute for Brain Disorders, Beijing Anding Hospital of Capital Medical University, Beijing, 10088, China
| | - Xia Wu
- College of Information Science and Technology, Beijing Normal University, No. 19 Xin Jie Kou Wai Da Jie, Beijing, 100875, China. .,State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
6
|
Defining optimal cutoff value of MGMT promoter methylation by ROC analysis for clinical setting in glioblastoma patients. J Neurooncol 2017; 133:193-201. [PMID: 28516344 DOI: 10.1007/s11060-017-2433-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 04/14/2017] [Indexed: 12/31/2022]
Abstract
Resistance to temozolomide (TMZ) chemotherapy poses a significant challenge in the treatment of glioblastoma (GBM). Hypermethylation in O6-methylguanine-DNA methyltransferase (MGMT) promoter is thought to play a critical role in this resistance. Pyrosequencing (PSQ) has been shown to be accurate and robust for MGMT promoter methylation testing. The unresolved issue is the determination of a cut-off value for dichotomization of quantitative MGMT PSQ results into "MGMT methylated" and "MGMT unmethylated" patient subgroups as a basis for further treatment decisions. In this study, receiver operating characteristic (ROC) curve analysis was used to identify an optimal cutoff of MGMT promoter methylation by testing mean percentage of methylation of 4 CpG islands (76-79) within MGMT exon 1. The area under the ROC (AUC) as well as the best cutoff to classify the methylation were calculated. Positive likelihood ratio (LR+) was chosen as a diagnostic parameter for defining an optimal cut-off. Meanwhile, we also analyzed whether mean percentage of methylation at the investigated CpG islands could be regarded as a marker for evaluating prognostication. ROC analysis showed that the optimal threshold was 12.5% (sensitivity: 60.87%; specificity: 76%) in response to the largest LR+ 2.54. 12.5% was established to distinguish MGMT promoter methylation, which was confirmed using validation set. According to the cutoff value, the MGMT promoter methylation was found in 58.3% of GBM. Mean methylation level of the investigated CpG sites strong correlated with overall survival (OS), which means GBM patients with a high level of methylation survived longer than those with low level of methylation(log-rank test, P = 0.017). In conclusion, ROC curve analysis enables the best cutoff for discriminating MGMT promoter methylation status. LR+ can be used as a key factor that evaluates cutoff. The promoter methylation level of MGMT by PSQ in GBM patients had prognostic value.
Collapse
|
7
|
Zou G. From Diagnostic Accuracy to Accurate Diagnosis: Interpreting a Test Result with Confidence. Med Decis Making 2016; 24:313-8. [PMID: 15155020 DOI: 10.1177/0272989x04265483] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background. The Standard for Reporting of Diagnostic Accuracy statement promotes the reporting of confidence intervals (CIs) for indices of diagnostic test accuracy. However, these indices must be combined with an estimate of pretest probability to properly interpret the results of such tests, thus yielding positive and negative predictive values. For small sample sizes, CI estimation for predictive values based on the classical logit transformation has been found to be very conservative. A method based on computer simulation has therefore been suggested as an alternative. Methods. ACI procedure for predictive values that yields limits completely contained in those provided by the logit transformation is proposed and evaluated. Results. The proposed approach to CI construction maintains nominal coverage very well even when sample sizes are small. Conclusion. Accurate CIs for positive and negative predictive values can be obtained without using computer simulation.
Collapse
Affiliation(s)
- Guangyong Zou
- Robarts Research Institute and the Department of Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada.
| |
Collapse
|
8
|
Kim E, Zeng D, Zhou XH. Semiparametric transformation models for multiple continuous biomarkers in ROC analysis. Biom J 2015; 57:808-33. [PMID: 26138227 DOI: 10.1002/bimj.201400043] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Revised: 01/24/2015] [Accepted: 02/06/2015] [Indexed: 11/11/2022]
Abstract
Recent technological advances continue to provide noninvasive and more accurate biomarkers for evaluating disease status. One standard tool for assessing the accuracy of diagnostic tests is the receiver operating characteristic (ROC) curve. Few statistical methods exist to accommodate multiple continuous-scale biomarkers in the framework of ROC analysis. In this paper, we propose a method to integrate continuous-scale biomarkers to optimize classification accuracy. Specifically, we develop semiparametric transformation models for multiple biomarkers. We assume that unknown and marker-specific transformations of biomarkers follow a multivariate normal distribution. Our models accommodate biomarkers subject to limits of detection and account for the dependence among biomarkers by including a subject-specific random effect. We also propose a diagnostic measure using an optimal linear combination of the transformed biomarkers. Our diagnostic rule does not depend on any monotone transformation of biomarkers and is not sensitive to extreme biomarker values. Nonparametric maximum likelihood estimation (NPMLE) is used for inference. We show that the parameter estimators are asymptotically normal and efficient. We illustrate our semiparametric approach using data from the Endometriosis, Natural History, Diagnosis, and Outcomes (ENDO) study.
Collapse
Affiliation(s)
- Eunhee Kim
- Department of Biostatistics and Center for Statistical Sciences, Brown University, Providence, RI, 02912, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, NC, 27599, USA
| | - Xiao-Hua Zhou
- HSRD Center of Excellence, VA Puget Sound Health Care System and Department of Biostatistics, University of Washington, Seattle, WA, 98198, USA
| |
Collapse
|
9
|
Parast L, Cai T. Landmark risk prediction of residual life for breast cancer survival. Stat Med 2013; 32:3459-71. [PMID: 23494768 DOI: 10.1002/sim.5776] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Revised: 12/31/2012] [Accepted: 02/07/2013] [Indexed: 12/31/2022]
Abstract
The importance of developing personalized risk prediction estimates has become increasingly evident in recent years. In general, patient populations may be heterogenous and represent a mixture of different unknown subtypes of disease. When the source of this heterogeneity and resulting subtypes of disease are unknown, accurate prediction of survival may be difficult. However, in certain disease settings, the onset time of an observable short-term event may be highly associated with these unknown subtypes of disease and thus may be useful in predicting long-term survival. One approach to incorporate short-term event information along with baseline markers for the prediction of long-term survival is through a landmark Cox model, which assumes a proportional hazards model for the residual life at a given landmark point. In this paper, we use this modeling framework to develop procedures to assess how a patient's long-term survival trajectory may change over time given good short-term outcome indications along with prognosis on the basis of baseline markers. We first propose time-varying accuracy measures to quantify the predictive performance of landmark prediction rules for residual life and provide resampling-based procedures to make inference about such accuracy measures. Simulation studies show that the proposed procedures perform well in finite samples. Throughout, we illustrate our proposed procedures by using a breast cancer dataset with information on time to metastasis and time to death. In addition to baseline clinical markers available for each patient, a chromosome instability genetic score, denoted by CIN25, is also available for each patient and has been shown to be predictive of survival for various types of cancer. We provide procedures to evaluate the incremental value of CIN25 for the prediction of residual life and examine how the residual life profile changes over time. This allows us to identify an informative landmark point, t(0) , such that accurate risk predictions of the residual life could be made for patients who survive past t(0) without metastasis.
Collapse
Affiliation(s)
- Layla Parast
- RAND Corporation, 1776 Main Street, Santa Monica, CA 90401, U.S.A.
| | | |
Collapse
|
10
|
Parast L, Cai B, Bedayat A, Kumamaru KK, George E, Dill KE, Rybicki FJ. Statistical methods for predicting mortality in patients diagnosed with acute pulmonary embolism. Acad Radiol 2012; 19:1465-73. [PMID: 23122566 DOI: 10.1016/j.acra.2012.09.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 09/17/2012] [Accepted: 09/18/2012] [Indexed: 12/20/2022]
Abstract
RATIONALE AND OBJECTIVES Risk stratification in pulmonary embolism (PE) guides patient management. The purpose of this study was to develop and test novel mortality risk prediction models for subjects with acute PE diagnosed using computed tomographic pulmonary angiography in a large cohort with comprehensive clinical data. MATERIALS AND METHODS Retrospective analyses of 1596 consecutive subjects diagnosed with acute PE from a single, large, urban teaching hospital included two modern statistical methods to predict survival in patients with acute PE. Landmark analysis was used for 90-day mortality. Adaptive least absolute shrinkage and selection operator (aLASSO), a penalization method, was used to select variables important for prediction and to estimate model coefficients. Receiver-operating characteristic analysis was used to evaluate the resulting prediction rules. RESULTS Using 30-day all-cause mortality outcome, three of the 16 clinical risk factors (the presence of a known malignancy, coronary artery disease, and increased age) were associated with high risk, while subjects treated with anticoagulation had lower risk. For 90-day landmark mortality, subjects with recent operations had a lower risk for death. Both prediction rules developed using aLASSO performed well compared to standard logistic regression. CONCLUSIONS The aLASSO regression approach combined with landmark analysis provides a novel tool for large patient populations and can be applied for clinical risk stratification among subjects diagnosed with acute PE. After positive results on computed tomographic pulmonary angiography, the presence of a known malignancy, coronary artery disease, and advanced age increase 30-day mortality. Additional risk stratification can be simplified with these methods, and future work will place imaging-based prediction of mortality in perspective with other clinical data.
Collapse
Affiliation(s)
- Layla Parast
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
| | | | | | | | | | | | | |
Collapse
|
11
|
Babchishin KM, Hanson RK, Helmus L. Even Highly Correlated Measures Can Add Incrementally to Predicting Recidivism Among Sex Offenders. Assessment 2012; 19:442-61. [DOI: 10.1177/1073191112458312] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Criterion-referenced measures, such as those used in the assessment of crime and violence, prioritize predictive accuracy (discrimination) at the expense of construct validity. In this article, we compared the discrimination and incremental validity of three commonly used criterion-referenced measures for sex offenders (Rapid Risk Assessment for Sex Offence Recidivism [RRASOR], Static-99R, and Static-2002R). In a meta-analysis of 20 samples ( n = 7,491), Static-99R and Static-2002R provided similar discrimination but outperformed the RRASOR in the prediction of sexual, violent, and any recidivism. Remarkably, despite large correlations between them ( rs ranging from .70 to .92), these risk scales consistently added incremental validity to one another. The direction of the incremental effects, however, was not consistently positive. When controlling for the other measures, high scores on the RRASOR were associated with lower risk for violent and any recidivism. We also examined different methods of combining risk scales and found that the averaging approach produced better discrimination than choosing the highest score and produced better calibration than either choosing the lowest or highest risk score. The findings reinforce the importance of understanding the psychological content of criterion-referenced measures, even when the sole purpose is to predict a particular outcome and provide some direction concerning the best methods for combining risk scales.
Collapse
Affiliation(s)
- Kelly M. Babchishin
- Public Safety Canada, Ottawa, Ontario, Canada
- Carleton University, Ottawa, Ontario, Canada
| | | | | |
Collapse
|
12
|
GUAN Z, QIN J, ZHANG B. Information borrowing methods for covariate-adjusted ROC curve. CAN J STAT 2012. [DOI: 10.1002/cjs.11145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
13
|
Abstract
Two types of confidence intervals (CIs) and confidence bands (CBs) for the receiver operating characteristic (ROC) curve are studied: pointwise CIs and simultaneous CBs. An optimized version of the pointwise CI with the shortest width is developed. A new ellipse-envelope simultaneous CB for the ROC curve is suggested as an adaptation of the Working-Hotelling-type CB implemented in a paper by Ma and Hall (1993). Statistical simulations show that our ellipse-envelope CB covers the true ROC curve with a probability close to nominal while the coverage probability of the Ma and Hall CB is significantly smaller. Simulations also show that our CI for the area under the ROC curve is close to nominal while the coverage probability of the CI suggested by Hanley and McNail (1982) uniformly overestimates the nominal value. Two examples illustrate our simultaneous ROC bands: radiation dose estimation from time to vomiting and discrimination of breast cancer from benign abnormalities using electrical impedance measurements.
Collapse
|
14
|
Hajian-Tilaki K, Hanley JA, Nassiri V. An extension of parametric ROC analysis for calculating diagnostic accuracy when underlying distributions are mixture of Gaussian. J Appl Stat 2011. [DOI: 10.1080/02664763.2010.545109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
15
|
Parast L, Cheng SC, Cai T. Incorporating short-term outcome information to predict long-term survival with discrete markers. Biom J 2011; 53:294-307. [PMID: 21337601 DOI: 10.1002/bimj.201000150] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 12/22/2010] [Accepted: 01/04/2011] [Indexed: 11/11/2022]
Abstract
In disease screening and prognosis studies, an important task is to determine useful markers for identifying high-risk subgroups. Once such markers are established, they can be incorporated into public health practice to provide appropriate strategies for treatment or disease monitoring based on each individual's predicted risk. In the recent years, genetic and biological markers have been examined extensively for their potential to signal progression or risk of disease. In addition to these markers, it has often been argued that short-term outcomes may be helpful in making a better prediction of disease outcomes in clinical practice. In this paper we propose model-free non-parametric procedures to incorporate short-term event information to improve the prediction of a long-term terminal event. We include the optional availability of a single discrete marker measurement and assess the additional information gained by including the short-term outcome. We focus on the semi-competing risk setting where the short-term event is an intermediate event that may be censored by the terminal event while the terminal event is only subject to administrative censoring. Simulation studies suggest that the proposed procedures perform well in finite samples. Our procedures are illustrated using a data set of post-dialysis patients with end-stage renal disease.
Collapse
Affiliation(s)
- Layla Parast
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA.
| | | | | |
Collapse
|
16
|
Qin J, Zhang B. Best combination of multiple diagnostic tests for screening purposes. Stat Med 2010; 29:2905-19. [DOI: 10.1002/sim.4068] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
17
|
Abstract
The diagnostic likelihood ratio function, DLR, is a statistical measure used to evaluate risk prediction markers. The goal of this paper is to develop new methods to estimate the DLR function. Furthermore, we show how risk prediction markers can be compared using rank-invariant DLR functions. Various estimators are proposed that accommodate cohort or case-control study designs. Performances of the estimators are compared using simulation studies. The methods are illustrated by comparing a lung function measure and a nutritional status measure for predicting subsequent onset of major pulmonary infection in children suffering from cystic fibrosis. For continuous markers, the DLR function is mathematically related to the slope of the receiver operating characteristic (ROC) curve, an entity used to evaluate diagnostic markers. We show that our methodology can be used to estimate the slope of the ROC curve and illustrate use of the estimated ROC derivative in variance and sample size calculations for a diagnostic biomarker study.
Collapse
Affiliation(s)
- Wen Gu
- Department of Medical Science, Global Biostatistics and Epidemiology, Amgen, Los Angeles, CA 91320, USA
| | | |
Collapse
|
18
|
Yao F, Craiu RV, Reiser B. Nonparametric covariate adjustment for receiver operating characteristic curves. CAN J STAT 2009. [DOI: 10.1002/cjs.10044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
19
|
Karliner LS, Napoles-Springer AM, Schillinger D, Bibbins-Domingo K, Pérez-Stable EJ. Identification of limited English proficient patients in clinical care. J Gen Intern Med 2008; 23:1555-60. [PMID: 18618200 PMCID: PMC2533382 DOI: 10.1007/s11606-008-0693-y] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Revised: 04/03/2008] [Accepted: 05/22/2008] [Indexed: 11/25/2022]
Abstract
BACKGROUND Standardized means to identify patients likely to benefit from language assistance are needed. OBJECTIVE To evaluate the accuracy of the U.S. Census English proficiency question (Census-LEP) in predicting patients' ability to communicate effectively in English. DESIGN We investigated the sensitivity and specificity of the Census-LEP alone or in combination with a question on preferred language for medical care for predicting patient-reported ability to discuss symptoms and understand physician recommendations in English. PARTICIPANTS Three hundred and two patients > 18 who spoke Spanish and/or English recruited from a cardiology clinic and an inpatient general medical-surgical ward in 2004-2005. RESULTS One hundred ninety-eight (66%) participants reported speaking English less than "very well" and 166 (55%) less than "well"; 157 (52%) preferred receiving their medical care in Spanish. Overall, 135 (45%) were able to discuss symptoms and 143 (48%) to understand physician recommendations in English. The Census-LEP with a high-threshold (less than "very well") had the highest sensitivity for predicting effective communication (100% Discuss; 98.7% Understand), but the lowest specificity (72.6% Discuss; 67.1% Understand). The composite measure of Census-LEP and preferred language for medical care provided a significant increase in specificity (91.9% Discuss; 83.9% Understand), with only a marginal decrease in sensitivity (99.4% Discuss; 96.7% Understand). CONCLUSIONS Using the Census-LEP item with a high-threshold of less than "very well" as a screening question, followed by a language preference for medical care question, is recommended for inclusive and accurate identification of patients likely to benefit from language assistance.
Collapse
Affiliation(s)
- Leah S. Karliner
- Division of General Internal Medicine and Medical Effectiveness Research Center for Diverse Populations, Department of Medicine, University of California San Francisco (UCSF), San Francisco, CA USA
| | - Anna M. Napoles-Springer
- Division of General Internal Medicine and Medical Effectiveness Research Center for Diverse Populations, Department of Medicine, University of California San Francisco (UCSF), San Francisco, CA USA
| | - Dean Schillinger
- Division of General Internal Medicine, Department of Medicine, Center for Vulnerable Populations, San Francisco General Hospital, UCSF, San Francisco, CA USA
| | - Kirsten Bibbins-Domingo
- Division of General Internal Medicine, Department of Medicine, Center for Vulnerable Populations, San Francisco General Hospital, UCSF, San Francisco, CA USA
| | - Eliseo J. Pérez-Stable
- Division of General Internal Medicine and Medical Effectiveness Research Center for Diverse Populations, Department of Medicine, University of California San Francisco (UCSF), San Francisco, CA USA
| |
Collapse
|
20
|
Briggs WM, Zaretzki R. The Skill Plot: A Graphical Technique for Evaluating Continuous Diagnostic Tests. Biometrics 2008; 64:250-6; discussion 256-61. [DOI: 10.1111/j.1541-0420.2007.00781_1.x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
21
|
Yilmaz S, Isik I, Afrouzian M, Monroy M, Sar A, Benediktsson H, McLaughlin K. Evaluating the accuracy of functional biomarkers for detecting histological changes in chronic allograft nephropathy. Transpl Int 2007; 20:608-15. [PMID: 17521383 DOI: 10.1111/j.1432-2277.2007.00494.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The most common cause of late kidney transplant failure is chronic allograft nephropathy (CAN). Much research has focused on identifying biomarkers (or correlates) that would predict subsequent CAN and allow timely intervention. Functional biomarkers such as serum creatinine and estimated glomerular filtration rate (eGFR) have been widely adopted, even though they have not been rigorously evaluated as surrogate markers. This study evaluated serum creatinine and eGFR for predicting the early histopathological changes seen in transplant protocol biopsies (TPB). We prospectively followed 289 kidney transplant patients in the Southern Alberta Transplant Program who had TPB at 6-12 months post-transplant. Tissue samples (n = 280) were independently examined by renal pathologists. The ability of serum creatinine or eGFR to predict the threshold level for abnormal histopathology was evaluated by calculating the area under the receiver operator characteristic curve. Serum creatinine and eGFR had poor predictive value (most confidence intervals included 0.5, indicating no predictive ability) for ten individual histological measurements (Banff 97 scores), and the Chronic Allograft Damage Index. We conclude that serum creatinine and eGFR have a limited clinical role in predicting the early histopathological changes that precede CAN and should not be used for this purpose.
Collapse
Affiliation(s)
- Serdar Yilmaz
- Division of Transplantation, Department of Surgery, University of Calgary, Foothills Medical Centre, 1403-29 Street NW, Calgary, Alberta, Canada.
| | | | | | | | | | | | | |
Collapse
|
22
|
Zamboanga BL, Horton NJ, Tyler KMB, O'Riordan SS, Calvert BD, McCollum EC. The utility of the AUDIT in screening for drinking game involvement among female college students. J Adolesc Health 2007; 40:359-61. [PMID: 17367733 DOI: 10.1016/j.jadohealth.2006.11.139] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2006] [Revised: 10/30/2006] [Accepted: 11/07/2006] [Indexed: 11/18/2022]
Abstract
Drinking games (DG) facilitate heavy alcohol consumption in a short period and are associated with negative experiences. We examined the utility of Alcohol Use Disorders Identification Test (AUDIT) cut-off scores to identify DG involvement. Findings indicated an AUDIT score of at least 5 is needed to identify gamers among students at a women's college. Prevention implications are discussed.
Collapse
Affiliation(s)
- Byron L Zamboanga
- Department of Psychology, Smith College, Northampton, Massachusetts 01063, USA.
| | | | | | | | | | | |
Collapse
|
23
|
Abstract
In a few cases, such as early pregnancy tests, the test results are dichotomous; many diagnostic tests, however, give results which are not binary. In the diagnosis of prostate cancer, prostate-specific antigen test result is on a continuous scale; or, in radiology, assessment of mammograms is on an ordinal scale. In such cases, the accuracy of the marker or test is often first summarized in a receiver operating characteristic (ROC) curve and then as the area under that curve. The area under the ROC curve, however, only shows the 'potential' of a marker; sooner or later, for practical uses, we still need to dichotomize the test result so that we can classify subjects as 'diseased' or 'healthy'. Finding an 'optimal' cutpoint to dichotomize a continuous marker is desirable and is a very basic problem but, in all or most cases, cutpoints used in practice are arbitrary. The difficulty lies in our failure to define and justify a criterion for optimality. In this paper, we will propose a solution by maximizing a well-known parameter--the Youden's Index--within the framework of the ROC curve.
Collapse
Affiliation(s)
- Chap T Le
- School of Public Health, Division of Biostatistics and Cancer Center, University of Minnesota, MMC 303, Minneapolis 55455, USA.
| |
Collapse
|
24
|
Wan S, Zhang B. Smooth semiparametric receiver operating characteristic curves for continuous diagnostic tests. Stat Med 2007; 26:2565-86. [PMID: 17072821 DOI: 10.1002/sim.2726] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We propose a semiparametric kernel distribution function estimator, based on which a new smooth semiparametric estimator of the receiver operating characteristic (ROC) curve is constructed. We derive the asymptotic bias and variance of the newly proposed distribution function estimator and show that it is more efficient than the traditional non-parametric kernel distribution estimator. We also derive the asymptotic bias and variance of our new ROC curve estimator and show that it is more efficient than the smooth non-parametric ROC curve estimator proposed by Zou et al. (Stat. Med. 1997; 16:2143-2156) and Lloyd (J. Am. Stat. Assoc. 1998; 93:1356-1364). For our proposed estimators, we derive data-based methods for bandwidth selection. In addition, we present some results on the analysis of two real data sets. Finally, a simulation study is presented to show that our estimators are better than the non-parametric counterparts in terms of bias, standard error, and mean-square error.
Collapse
Affiliation(s)
- Shuwen Wan
- School of Life Sciences, Nanjing University, Nanjing 210093, China
| | | |
Collapse
|
25
|
Lin CY, Barnhart HX, Kosinski AS. The weighted generalized estimating equations approach for the evaluation of medical diagnostic test at subunit level. Biom J 2006; 48:758-71. [PMID: 17094341 DOI: 10.1002/bimj.200510199] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Sensitivity and specificity are common measures used to evaluate the performance of a diagnostic test. A diagnostic test is often administrated at a subunit level, e.g. at the level of vessel, ear or eye of a patient so that the treatment can be targeted at the specific subunit. Therefore, it is essential to evaluate the diagnostic test at the subunit level. Often patients with more negative subunit test results are less likely to receive the gold standard tests than patients with more positive subunit test results. To account for this type of missing data and correlation between subunit test results, we proposed a weighted generalized estimating equations (WGEE) approach to evaluate subunit sensitivities and specificities. A simulation study was conducted to evaluate the performance of the WGEE estimators and the weighted least squares (WLS) estimators (Barnhart and Kosinski, 2003) under a missing at random assumption. The results suggested that WGEE estimator is consistent under various scenarios of percentage of missing data and sample size, while the WLS approach could yield biased estimators due to a misspecified missing data mechanism. We illustrate the methodology with a cardiology example.
Collapse
Affiliation(s)
- Carol Y Lin
- Department of Biostatistics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA 30322, USA.
| | | | | |
Collapse
|
26
|
Zhang B. A semiparametric hypothesis testing procedure for the ROC curve area under a density ratio model. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2005.02.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Da Dalt L, Marchi AG, Laudizi L, Crichiutti G, Messi G, Pavanello L, Valent F, Barbone F. Predictors of intracranial injuries in children after blunt head trauma. Eur J Pediatr 2006; 165:142-8. [PMID: 16311740 DOI: 10.1007/s00431-005-0019-6] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2005] [Revised: 06/16/2005] [Accepted: 08/08/2005] [Indexed: 01/21/2023]
Abstract
UNLABELLED This study was conducted to determine if clinical features can predict the risk of intracranial injury (ICI) in pediatric closed head trauma. We enrolled 3,806 children under 16 years consecutively referred for acute closed head trauma to the paediatric emergency room of five Italian children's hospitals. Relevant outcomes were death and diagnosis of ICI. Clinical symptoms and signs were evaluated as possible outcome predictors. Children were also classified into five groups according to their clinical presentation. The association of ICI with signs and symptoms and the appropriateness of the five-group classification in predicting the likelihood of ICI were evaluated by logistic regression analyses. ICI was diagnosed in 22 children; 2 of them died. The risk of fatal and nonfatal ICI was 0.5 and 5.2 per 1,000 children with closed head trauma respectively. Significant associations were found between ICI and loss of consciousness, prolonged headache, persistent drowsiness, abnormal mental status, focal neurological signs, signs of skull fracture in non-frontal areas and signs of basal skull fracture. The five-group classification of children allowed an excellent prediction in terms of likelihood of ICI (ROC area 0.972). CONCLUSIONS Selection of children with closed head trauma based on different combinations of signs and symptoms allows for early identification of subjects at different risk for ICI. In patients with minor head injuries, the absence of loss of consciousness, drowsiness, amnesia, prolonged headache, clinical evidence of basal or non-frontal skull fracture identified 100% of children without lesions. Validation of our results with a larger sample of patients with ICI would be highly desirable.
Collapse
Affiliation(s)
- Liviana Da Dalt
- Dipartimento di Pediatria, Università di Padova, Via Giustiniani 3, 35128, Padova, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
BACKGROUND Markers that purport to distinguish subjects with a condition from those without a condition must be evaluated rigorously for their classification accuracy. A single approach for statistical evaluation and comparison of markers is not yet established. METHODS We suggest a standardization that uses the marker distribution in unaffected subjects as a reference. For an affected subject with marker value Y, the standardized placement value is the proportion of unaffected subjects with marker values that exceed Y. RESULTS We applied the standardization to 2 illustrative datasets. As a marker for pancreatic cancer, the CA-19-9 marker had smaller placement values than the CA-125 marker, indicating that CA-19-9 was the better marker. For detecting hearing impairment, the placement values for the test output (the marker) were smaller when the input sound stimulus was of lower intensity, which indicates that the test better distinguishes hearing-impaired from unimpaired ears when a lower intensity sound stimulus is used. Explicit connections are drawn between the distribution of standardized marker values and the receiver operating characteristic curve, one established statistical technique for evaluating classifiers. CONCLUSION The standardization is an intuitive procedure for evaluating markers. It facilitates direct and meaningful comparisons between markers. It also provides a new view of receiver operating characteristic analysis that may render it more accessible to those as yet unfamiliar with it. The general approach provides a statistical tool to address important questions that are typically not addressed in current marker research, such as quantifying and controlling for covariate effects.
Collapse
|
29
|
Astrakas LG, Zurakowski D, Tzika AA, Zarifi MK, Anthony DC, De Girolami U, Tarbell NJ, Black PM. Noninvasive magnetic resonance spectroscopic imaging biomarkers to predict the clinical grade of pediatric brain tumors. Clin Cancer Res 2005; 10:8220-8. [PMID: 15623597 DOI: 10.1158/1078-0432.ccr-04-0603] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The diagnosis and therapy of childhood brain tumors, most of which are low grade, can be complicated because of their frequent adjacent location to crucial structures, which limits diagnostic biopsy. Also, although new prognostic biomarkers identified by molecular analysis or DNA microarray gene profiling are promising, they too depend on invasive biopsy. Here, we test the hypothesis that combining information from biologically important intracellular molecules (biomarkers), noninvasively obtained by proton magnetic resonance spectroscopic imaging, will increase the diagnostic accuracy in determining the clinical grade of pediatric brain tumors. We evaluate the proton magnetic resonance spectroscopic imaging exams for 66 children with brain tumors. The intracellular biomarkers for choline-containing compounds (Cho), N-acetylaspartate, total creatine, and lipids and/or lactate were measured at the highest Cho region and normalized to the surrounding healthy tissue total creatine. Neuropathological grading was done with WHO criteria. Normalized Cho and lipids and/or lactate were elevated in high-grade (n = 23) versus low-grade (n = 43) tumors, which multiple logistic regression confirmed are independent predictors of tumor grade (for Cho, odds ratio 24.8, P < 0.001; and for lipids and/or lactate, odds ratio 4.4, P < 0.001). A linear combination of normalized Cho and lipids and/or lactate that maximizes diagnostic accuracy was calculated by maximizing the area under the receiver operating characteristic curve. Proton magnetic resonance spectroscopic imaging, although not a proxy for histology, provides noninvasive, in vivo biomarkers for predicting clinical grades of pediatric brain tumors.
Collapse
Affiliation(s)
- Loukas G Astrakas
- Nuclear Magnetic Resonance Surgical Laboratory, Department of Surgery, Massachusetts General Hospital, Shriners Burns Institute, Harvard Medical School, Boston, Massachusetts 02114, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Tosteson TD, Buonaccorsi JP, Demidenko E, Wells WA. Measurement Error and Confidence Intervals for ROC Curves. Biom J 2005; 47:409-16. [PMID: 16161800 DOI: 10.1002/bimj.200310159] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Measurement error in a continuous test variable may bias estimates of the summary properties of receiver operating characteristics (ROC) curves. Typically, unbiased measurement error will reduce the diagnostic potential of a continuous test variable. This paper explores the effects of possibly heterogenous measurement error on estimated ROC curves for binormal test variables. Corrected estimators for specific points on the curve are derived under the assumption of known or estimated measurement variances for individual test results. These estimators and associated confidence intervals do not depend on normal assumptions for the distribution of the measurement error and are shown to be approximately unbiased for moderate size samples in a simulation study. An application from a study of emerging imaging modalities in breast cancer is used to demonstrate the new techniques.
Collapse
|
31
|
Abstract
Modern technologies promise to provide new ways of diagnosing disease, detecting subclinical disease, predicting prognosis, selecting patient specific treatment, identifying subjects at risk for disease, and so forth. Advances in genomics, proteomics and imaging modalities in particular hold great potential for assisting with classification/prediction in medicine. Before a classifier can be adopted for routine use in health care, its classification accuracy must be determined. Standards for evaluating new clinical classifiers however, lag far behind the well established standards that exist for evaluating new clinical treatments. In this paper, we discuss a phased approach to developing a new classifier (or biomarker). It mirrors the internationally established phase 1-2-3 paradigm for therapeutic drugs. The defined phases lead to a logical sequence of studies for classifier development. We emphasize that evaluating classification accuracy is fundamentally different from simply establishing association with outcome. Therefore, study objectives and designs differ from the familiar methods of clinical trials. We discuss these briefly for each phase.Finally, we argue that classifier development requires some rethinking of traditional data analysis techniques. As an example we show that maximizing the likelihood function to fit a logistic regression model to multiple predictors, can yield a poor classifier. Instead we demonstrate that an approach that maximizes an alternative objective function characterizing classification accuracy performs better.
Collapse
Affiliation(s)
- M S Pepe
- Department of Biostatistics, University of Washington, Seattle, 98109-1024, USA.
| |
Collapse
|
32
|
Mazumdar M. Group sequential design for comparative diagnostic accuracy studies: implications and guidelines for practitioners. Med Decis Making 2004; 24:525-33. [PMID: 15359002 DOI: 10.1177/0272989x04269240] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
PURPOSE Comparative diagnostic accuracy (CDA) studies are typically small retrospective studies supporting a higher accuracy for one modality over another for either staging a particular disease or assessing response to therapy, and they are used to generate hypotheses for larger prospective trials. The purpose of this article is to introduce the group sequential design (GSD) approach in planning these larger trials. METHODS Methodology needed for using GSD in the CDA studies is recently developed. In this article, GSD with the O'Brien and Fleming (OBF) stopping rule is described and guidelines for sample size calculation are provided. Simulated data is used to demonstrate the application of GSD in the design/analysis of a clinical trial in the CDA study setting. RESULTS The expected sample size needed for planning a trial with GSD (under the OBF stopping rule) is slightly inflated but may ultimately result in greater savings of patient resources. CONCLUSION GSD is a specialized statistical method that is helpful in balancing the ethical and financial advantages of stopping a study early against the risk of an incorrect conclusion and should be adopted for planning CDA studies.
Collapse
Affiliation(s)
- Madhu Mazumdar
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 307 E. 63rd St., 3rd floor, New York, NY 10021, USA.
| |
Collapse
|
33
|
Do TT, Dibley MJ, D'Este C. Receiver operating characteristic analysis of body mass index to detect increased risk of functional morbidity in Vietnamese rural adults. Eur J Clin Nutr 2004; 58:1594-603. [PMID: 15226755 DOI: 10.1038/sj.ejcn.1602010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
OBJECTIVE To assess the performance of low BMI, and define optimal BMI cut-off values in order to detect fever-associated adult morbidity. DESIGN A cohort study of adults between 18 and 60 y in rural Vietnam, whose BMI and health status were assessed at baseline, and who were then monitored for illness events for 4 months. Nonparametric receiver operating characteristic (ROC) analysis was used to evaluate the performance of low BMI to detect the average number of restricted-days due to illness and to determine optimal cut-off values. SETTING A rural commune in the Red River Delta, northern Vietnam. SUBJECTS The study included 693 men and 739 women aged 18-60-y. RESULTS At baseline, 21% of the study participants had a BMI<18.5 kg/m(2). As BMI progressively decreased, the percentage of participants experiencing morbidity with fever increased. The areas under the ROC curves for BMI were significantly greater than 0.5 for all levels of monthly average restricted-days of illness (MARDI) with fever, with best performance for >5 days/month. Excluding participants with acute or chronic disease at baseline improved the performance of BMI to detect MARDI with fever of >5 days (area under ROC curve 0.95; 95% CI 0.92, 0.99). With increasing levels of MARDI with fever, BMI cut-offs fell to 17.9 kg/m(2) when MARDI with fever was >5 days. CONCLUSIONS The ROC analysis demonstrates that low BMI performs well as a risk indicator of MARDI with fever of >5 days with an optimal BMI cut-off value of 17.9 kg/m(2).
Collapse
Affiliation(s)
- T T Do
- Department of Education and Information, National Institute of Nutrition, Ministry of Health, Hanoi, Vietnam.
| | | | | |
Collapse
|
34
|
|
35
|
Kosinski AS, Barnhart HX. A global sensitivity analysis of performance of a medical diagnostic test when verification bias is present. Stat Med 2003; 22:2711-21. [PMID: 12939781 DOI: 10.1002/sim.1517] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Current advances in technology provide less invasive or less expensive diagnostic tests for identifying disease status. When a diagnostic test is evaluated against an invasive or expensive gold standard test, one often finds that not all patients undergo the gold standard test. The sensitivity and specificity estimates based only on the patients with verified disease are often biased. This bias is called verification bias. Many authors have examined the consequences of verification bias and have proposed bias correction methods based on the assumption of independence between disease status and election for verification conditionally on the test result, or equivalently on the assumption that the disease status is missing at random using missing data terminology. This assumption may not be valid and one may need to consider adjustment for a possible non-ignorable verification bias resulting from the non-ignorable missing data mechanism. Such an adjustment involves ultimately uncheckable assumptions and requires sensitivity analysis. The sensitivity analysis is most often accomplished by perturbing parameters in the chosen model for the missing data mechanism, and it has a local flavour because perturbations are around the fitted model. In this paper we propose a global sensitivity analysis for assessing performance of a diagnostic test in the presence of verification bias. We derive a region of all sensitivity and specificity values consistent with the observed data and call this region a test ignorance region (TIR). The term 'ignorance' refers to the lack of knowledge due to the missing disease status for the not verified patients. The methodology is illustrated with two clinical examples.
Collapse
Affiliation(s)
- Andrzej S Kosinski
- Department of Biostatistics, The Rollins School of Public Health of Emory University, 1518 Clifton Road, NE Atlanta, Georgia 30322, USA.
| | | |
Collapse
|
36
|
|
37
|
Barnhart HX, Kosinski AS. Evaluating medical diagnostic tests at the subunit level in the presence of verification bias. Stat Med 2003; 22:2161-76. [PMID: 12820281 DOI: 10.1002/sim.1436] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We advocate that medical diagnostic tests should be evaluated at the subunit level instead of the patient level if a disease can occur in multiple parts/units within a patient, for example, vessels, segments, ears, eyes etc. When a non-invasive test is compared to an invasive gold standard test, often not all of the subunits receive the gold standard test and verification bias is present if the subunits without the gold standard test are discarded. Here we address estimation and inference issues in assessing the performance of medical diagnostic tests at the subunit level while accounting for verification bias and the correlation among subunits. We present a weighted least squares approach and demonstrate how the method can be implemented by using the procedure PROC CATMOD from the popular SAS software. A cardiology example is presented and we discuss application of the method to the case of multiple tests and a single gold standard test.
Collapse
Affiliation(s)
- Huiman X Barnhart
- Department of Biostatistics, The Rollins School of Public Health of Emory University, 1518 Clifton Road, NE Atlanta, Georgia 30322, USA.
| | | |
Collapse
|
38
|
Abstract
BACKGROUND Most quantitative tests do not perfectly discriminate between subjects with and without a given disease and their results do not always allow certainty about disease status for diagnostic or screening purposes. We propose a method to construct a three-zone partition for quantitative tests to avoid the binary constraint of a 'black or white' decision that often does not fit the reality of clinical or screening practice. This partition intentionally includes a grey zone between positive and negative conclusions. METHODS AND RESULTS We show that the width of this grey zone depends on the difference between the means of test results for subjects with and without the disease, the variability of the test results and its components (biological, measurement), and the level of the misclassification risks (false positive, false negative) required by the context of use. We illustrate the method by application to the tuberculin skin test and iron deficiency markers in children. CONCLUSION This method can be used both to display the discriminatory performance of a quantitative test in a variety of contexts and to scrutinize its components of variability. Due to the simplicity of the graphical representations, the grey zone approach may be useful during the development of quantitative tests and the publication of their performance.
Collapse
Affiliation(s)
- Joël Coste
- Département de Biostatistique, Pavillon Saint-Jacques, Hôpital COCHIN, 27 rue du Faubourg Saint-Jacques, 75674 Paris Cedex 14, France.
| | | |
Collapse
|
39
|
Kosinski AS, Barnhart HX. Accounting for nonignorable verification bias in assessment of diagnostic tests. Biometrics 2003; 59:163-71. [PMID: 12762453 DOI: 10.1111/1541-0420.00019] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A "gold" standard test, providing definitive verification of disease status, may be quite invasive or expensive. Current technological advances provide less invasive, or less expensive, diagnostic tests. Ideally, a diagnostic test is evaluated by comparing it with a definitive gold standard test. However, the decision to perform the gold standard test to establish the presence or absence of disease is often influenced by the results of the diagnostic test, along with other measured, or not measured, risk factors. If only data from patients who received the gold standard test were used to assess the test performance, the commonly used measures of diagnostic test performance--sensitivity and specificity--are likely to be biased. Sensitivity would often be higher, and specificity would be lower, than the true values. This bias is called verification bias. Without adjustment for verification bias, one may possibly introduce into the medical practice a diagnostic test with apparent, but not truly, high sensitivity. In this article, verification bias is treated as a missing covariate problem. We propose a flexible modeling and computational framework for evaluating the performance of a diagnostic test, with adjustment for nonignorable verification bias. The presented computational method can be utilized with any software that can repetitively use a logistic regression module. The approach is likelihood-based, and allows use of categorical or continuous covariates. An explicit formula for the observed information matrix is presented, so that one can easily compute standard errors of estimated parameters. The methodology is illustrated with a cardiology data example. We perform a sensitivity analysis of the dependency of verification selection process on disease.
Collapse
Affiliation(s)
- Andrzej S Kosinski
- Department of Biostatistics, The Rollins School of Public Health of Emory University, 1518 Clifton Road, NE, Atlanta, Georgia 30322, USA.
| | | |
Collapse
|
40
|
Woodall WH, Koudelik R, Tsui KL, Kim SB, Stoumbos ZG, Carvounis CP. A Review and Analysis of the Mahalanobis—Taguchi System. Technometrics 2003. [DOI: 10.1198/004017002188618626] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
41
|
Hajian-Tilaki KO, Hanley JA. Comparison of three methods for estimating the standard error of the area under the curve in ROC analysis of quantitative data. Acad Radiol 2002; 9:1278-85. [PMID: 12449360 DOI: 10.1016/s1076-6332(03)80561-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
RATIONALE AND OBJECTIVES Several methods have been proposed for estimating the standard error (SE) of the area under the curve (AUC) in receiver operating characteristic analysis. The authors examined the validity of three methods--the LABROC procedure, exponential approximation, and the method of DeLong et al (purely nonparametric)--for estimating the SE of the AUC in receiver operating characteristic analysis of quantitative diagnostic data. MATERIALS AND METHODS The authors conducted a broad numerical investigation to assess how to estimate the SE of AUC in various configurations of binormal and nonbinormal pairs of distributions, in which one or both pair members were mixtures of Gaussian distributions (the samples included 100 in the diseased group and 100 in the nondiseased group). RESULTS The authors found that exponential approximation of the SE of AUC slightly underestimates the observed SE of a nonparametric estimate of the AUC when the ratio of the standard deviation of distributions for diseased to nondiseased populations was greater than 2. With binormal data the observed SE tended to be smaller with the LABROC procedure (semiparametric) than with the method of DeLong et al, but the LABROC procedure yields more conservative estimates of SE with nonbinormal data. In particular, with bimodal data it often produces a more conservative (ie, larger) estimate of the actual (observed) fluctuation. CONCLUSION Overall, the LABROC procedure and the method of DeLong et al yielded very close estimates of the SE of AUC, even with data generated from a nonbinormal model. The choice between these two methods can be based on users' preferences and practicality.
Collapse
|
42
|
Moons KGM, Grobbee DE. When should we remain blind and when should our eyes remain open in diagnostic studies? J Clin Epidemiol 2002; 55:633-6. [PMID: 12160909 DOI: 10.1016/s0895-4356(02)00408-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Karel G M Moons
- Julius Center for General Practice and Patient Oriented Research, University Medical Center, P.O. Box 80035, 3508 TA, Utrecht, The Netherlands.
| | | |
Collapse
|
43
|
Gunnarsson RK, Lanke J. The predictive value of microbiologic diagnostic tests if asymptomatic carriers are present. Stat Med 2002; 21:1773-85. [PMID: 12111911 DOI: 10.1002/sim.1119] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
If a proper gold standard is not available, then the predictive value of a test cannot be estimated. In this paper the concept of etiologic predictive value (EPV) is introduced. It is a quantity that will yield the predictive value of a test to predict presence of a specified disease in situations for which no proper gold standard is available. This is achieved by using information obtained from a healthy control population. This quantity requires that the marker in our test is present in all individuals having the specified disease, as in the case where the marker is the aetiologic factor for the specified disease. Furthermore this quantity requires that asymptomatic carriers are present. This means that not all individuals with the marker has the specified disease. EPV is developed with special reference to the evaluation of bacterial cultures, or rapid tests to detect a bacterium, but the quantity might be used in other circumstances as well. EPV is applied to an example in which conventional throat culture is evaluated. Further information concerning EPV can be found at http://www.infovice.se/fou/epv.
Collapse
Affiliation(s)
- Ronny K Gunnarsson
- Department of Primary Health Care, Göteborg University, Vasa Hospital, S-411 33 Gothenburg, Sweden.
| | | |
Collapse
|
44
|
Alonzo TA, Pepe MS, Moskowitz CS. Sample size calculations for comparative studies of medical tests for detecting presence of disease. Stat Med 2002; 21:835-52. [PMID: 11870820 DOI: 10.1002/sim.1058] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Technologic advances give rise to new tests for detecting disease in many fields, including cancer and sexually transmitted disease. Before a new disease screening test is approved for public use, its accuracy should be shown to be better than or at least not inferior to an existing test. Standards do not yet exist for designing and analysing studies to address this issue. Established principles for the design of therapeutic studies can be adapted for studies of screening tests. In particular, drawing upon methods for superiority and non-inferiority studies of therapeutic agents, we propose that confidence intervals for the relative accuracy of dichotomous tests drive the design of comparative studies of disease screening tests. We derive sample size formulae for a variety of designs, including studies where patients undergo several tests and studies where patients receive only one of the tests under evaluation. Both cohort and case-control study designs are considered. Modifications to the confidence intervals and sample size formulae are discussed to accommodate studies where, because of the invasive nature of definitive testing, true disease status can only be obtained for subjects who are positive on one or more of the screening tests. The methods proposed are applied to a study comparing a modified pap test to the conventional pap for cervical cancer screening. The impact of error in the gold standard reference test on the design and evaluation of comparative screening test studies is also discussed.
Collapse
Affiliation(s)
- Todd A Alonzo
- Department of Preventive Medicine, Children's Oncology Group, University of Southern California, P.O. Box 60112, Arcadia, CA 91066, USA.
| | | | | |
Collapse
|
45
|
Lumley T, Heagerty P. Graphical Exploratory Analysis of Survival Data. J Comput Graph Stat 2000. [DOI: 10.1080/10618600.2000.10474910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
46
|
Abstract
The accuracy of a medical diagnostic test is often summarized in a receiver operating characteristic (ROC) curve. This paper puts forth an interpretation for each point on the ROC curve as being a conditional probability of a test result from a random diseased subject exceeding that from a random nondiseased subject. This interpretation gives rise to new methods for making inference about ROC curves. It is shown that inference can be achieved with binary regression techniques applied to indicator variables constructed from pairs of test results, one component of the pair being from a diseased subject and the other from a nondiseased subject. Within the generalized linear model (GLM) binary regression framework, ROC curves can be estimated, and we highlight a new semiparametric estimator. Covariate effects can also be evaluated with the GLM models. The methodology is applied to a pancreatic cancer dataset where we use the regression framework to compare two different serum biomarkers. Asymptotic distribution theory is developed to facilitate inference and to provide insight into factors influencing variability of estimated model parameters.
Collapse
Affiliation(s)
- M S Pepe
- Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
| |
Collapse
|
47
|
Abstract
ROC curves are a popular method for displaying sensitivity and specificity of a continuous diagnostic marker, X, for a binary disease variable, D. However, many disease outcomes are time dependent, D(t), and ROC curves that vary as a function of time may be more appropriate. A common example of a time-dependent variable is vital status, where D(t) = 1 if a patient has died prior to time t and zero otherwise. We propose summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which we denote as ROC(t). A typical complexity with survival data is that observations may be censored. Two ROC curve estimators are proposed that can accommodate censored data. A simple estimator is based on using the Kaplan-Meier estimator for each possible subset X > c. However, this estimator does not guarantee the necessary condition that sensitivity and specificity are monotone in X. An alternative estimator that does guarantee monotonicity is based on a nearest neighbor estimator for the bivariate distribution function of (X, T), where T represents survival time (Akritas, M. J., 1994, Annals of Statistics 22, 1299-1327). We present an example where ROC(t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer and an example where the ROC(t) curve displays the impact of modifying eligibility criteria for sample size and power in HIV prevention trials.
Collapse
Affiliation(s)
- P J Heagerty
- Department of Biostatistics, University of Washington, Seattle 98195, USA.
| | | | | |
Collapse
|
48
|
Catalán Arlandis JL, Jiménez Torres NV. Anthropometric and pharmacotherapeutic variables on acute emesis induced by cisplatin-containing chemotherapy. Ann Pharmacother 2000; 34:573-9. [PMID: 10852082 DOI: 10.1345/aph.19188] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
OBJECTIVE To characterize the effects of anthropometric and pharmacotherapeutic variables on acute emesis induced by cisplatin-containing regimens with dosages > or =50 mg x m(-2). METHODS A prospective, cross-sectional, noncontrolled study was performed to analyze acute vomiting during the first 24 hours in patients treated in a Spanish hospital. The patients received an intravenous combination of drugs (2 doses of metoclopramide 3 mg/kg, dexamethasone 20 mg) as first-choice antiemetic therapy. Intravenous ondansetron 8 mg and dexamethasone 20 mg served as an alternative regimen in patients <30 years old with a history of extrapyramidal manifestations or emesis in previous cycles. Therapeutic failure was used as a dependent variable, defined as three or more vomiting episodes documented by the patients. Other variables were the chemotherapeutic regimen; antiemetic regimen; patient gender, age, weight, and height; and cycle number. The reference logistic model and two reduced-models derived from the latter were designed. The logistic models were subsequently validated by means of receiving operating characteristic curves. RESULTS A total of 319 cycles involving 106 patients were studied. The metoclopramide regimen was administered in 66% of the cycles. The therapeutic failure rate was 21% for the metoclopramide regimen and 32% for the ondansetron treatment. The logistic model developed identified the type of chemotherapeutic regimen provided as the most significant prognostic variable (p < 0.0001). Patient weight (odds ratio 1.64) and height (odds ratio 1.28) were identified as prognostic factors related with therapeutic failure. CONCLUSIONS The type of chemotherapeutic regimen administered and the anthropometric characteristics of the patients exert a clear conditioning effect on risks associated with therapeutic failure against acute emesis following high-dose cisplatin therapy. Such anthropometric parameters have not been previously identified as prognostic factors.
Collapse
|
49
|
Farr BM, Shapiro DE. Diagnostic tests: distinguishing good tests from bad and even ugly ones. Infect Control Hosp Epidemiol 2000; 21:278-84. [PMID: 10782593 DOI: 10.1086/501760] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
This article focuses on the selection and interpretation of diagnostic tests, emphasizing the importance of understanding how their mathematical parameters affect the information they provide in various settings. The utility and limitations of sensitivity, specificity, predictive value, and receiver operating characteristic (ROC) curves are discussed using catheter-related bloodstream infections as an example. ROC curves have been used for selecting optimal cutoff values for a positive result and for selecting among several alternative diagnostic tests. For example, 16 different tests have been proposed for diagnosis of catheter-related bloodstream infection; ROC analysis provides an effective way to determine which test offers the best overall performance.
Collapse
Affiliation(s)
- B M Farr
- University of Virginia Health System, Charlottesville 22908-0473, USA
| | | |
Collapse
|
50
|
Abstract
Laboratory diagnostic tests are central in the practice of modern medicine. Common uses include screening a specific population for evidence of disease and confirming or ruling out a tentative diagnosis in an individual patient. The interpretation of a diagnostic test result depends on both the ability of the test to distinguish diseased from nondiseased subjects and the particular characteristics of the patient and setting in which the test is being used. This article reviews statistical methodology for assessing laboratory diagnostic test accuracy and interpreting individual test results, with an emphasis on diagnostic tests that yield a continuous measurement. The article begins with a summary of basic concepts and terminology, then briefly discusses study design and reviews methods for assessing the accuracy of a single diagnostic test, comparing the accuracy of two or more diagnostic tests and interpreting individual test results.
Collapse
Affiliation(s)
- D E Shapiro
- Center for Biostatistics in AIDS Research, Harvard School of Public Health, Boston, Massachusetts 02115-6017, USA.
| |
Collapse
|